Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The notion of scalable frame has been investigated in recent years [4, 10, 15, 17], where the focus was more on characterizing frames whose vectors can be rescaled resulting in a tight frame. For completeness, we recall that a set of vectors F = { f i } i = 1 M in some (finite dimensional) Hilbert space \(\mathscr{H}\) is a frame for \(\mathscr{H}\) if there exist two constants 0 < A ≤ B <  such that

$$\displaystyle{A\|x\|^{2} \leq \sum _{ i=i}^{M}\vert \langle x,f_{ i}\rangle \vert ^{2} \leq B\|x\|^{2}}$$

for all \(x \in \mathscr{ H}.\) When A = B the frame is said to be tight and if in addition, A = B = 1 it is termed a Parseval frame. When F = { f i } i = 1 M is a frame, we shall abuse notations and denote by F again, the n × M matrix whose i th column is f i , and where n is the dimension of \(\mathscr{H}\). Using this notation, the frame operator is the n × n matrix S = FF where F is the adjoint of F. It is a folklore to note that F is a frame if and only if S is a positive definite operator and the optimal lower frame bound, A, coincides with the lowest eigenvalue of S while the optimal upper frame bound, B, equals the largest eigenvalue of S. We refer to [6, 7, 20] for more details on frame theory.

It is apparent that tight frames are optimal frames in the sense that the condition number of their frame operator is 1. We recall that the condition number of a matrix A, denoted κ(A), is defined as the ratio of the largest singular value and the smallest singular value of A, i.e., κ(A) = σ max(A)∕σ min(A). By analogy, for a frame in a Hilbert space \(\{f_{i}\}_{i=1}^{M} \subseteq \mathscr{ H}\) with optimal frame bounds A and B, we define the condition number of the frame to be the condition number of its associated frame operator κ({f i }): = κ(S) = BA. In particular, if a frame is Parseval, then its condition number equals 1. In fact, a frame is tight if and only if its condition number is 1. Scalable frames were precisely introduced to turn a non-optimal (non-tight) frame into an optimal one, by just rescaling the length of each frame vector. More precisely,

Definition 1 ([16, Definition 2.1]).

A frame {f i } i = 1 M in some Hilbert space \(\mathscr{H}\) is called a scalable frame if there exist nonnegative numbers s 1, , s M such that {s i f i } i = 1 M is a Parseval frame for \(\mathscr{H}\).

It follows from the definition that a frame {f i } i = 1 M is scalable if and only if there exist scalars s i  ≥ 0 so that

$$\displaystyle{\kappa \left (\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right ) = 1.}$$

To date various equivalent characterizations of scalable frames have been proved and attempts to measure how close to scalable a non-scalable frame have been offered [4, 15, 17, 21]. In particular, if a frame is not scalable, then one can naturally measure how “not scalable” the frame is by measuring

$$\displaystyle{ \min _{s_{i}\geq 0}\left \|I_{n} -\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right \|_{ F}, }$$
(1)

as proposed in [8], where \(\left \|\cdot \right \|_{F}\) denotes the Frobenius norm of a matrix. Other measures of scalability were also proposed by the same authors. However, it is not clear that, when a frame is not scalable, an optimal solution to (1) yields a frame {s i f i } that is as best conditioned as possible. Recently, the relationship between the solution to this problem and the condition number of a frame has been investigated in [5]. In particular, Casazza and Chen show that the problem of minimizing the condition number of a scaled frame

$$\displaystyle{ \min _{s_{i}\geq 0}\kappa \left (\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right ), }$$
(2)

is equivalent to solving the minimization problem

$$\displaystyle{ \min _{s_{i}\geq 0}\left \|I_{n} -\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right \|_{ 2}, }$$
(3)

where \(\left \|\cdot \right \|_{2}\) is the operator norm of a matrix. Specifically they show that any optimizer of (2) is also an optimizer of (3); vice-versa, any optimizer of (3) minimizes the condition number in (2). Furthermore, they show that the optimal solution to (1) does not even have to be a frame, and so would yield an undefined condition number for the corresponding system.

In this chapter, we consider numerical solutions to the scalability problem. Recall that a frame \(F =\{ f_{i}\}_{i=i}^{M} \subset \mathscr{ H}\) is scalable if and only if they exist scalars {s i } i = 1 M ⊂ [0, ) such that

$$\displaystyle{\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i} = I.}$$

Consequently, the condition number of the scaled frame \(\tilde{F} =\{ s_{i}f_{i}\}_{i=i}^{M}\) is 1. We are thus interested in investigating the solutions to the following three optimization problems:

$$\displaystyle{ \min _{s_{i}\geq 0\,,\,s\neq \mathbf{0}}\dfrac{\lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )} {\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )}. }$$
(4)
$$\displaystyle{ \min _{\begin{array}{c} s_{i} \geq 0\,,\,s\neq \mathbf{0} \\ \sum _{i=1}^{M}s_{i}^{2}\left \|f_{i}\right \|_{2}^{2} = N\end{array} }\lambda _{\max }\left (\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right )-\lambda _{\min }\left (\sum _{ i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right ). }$$
(5)
$$\displaystyle{ \min _{s_{i}\geq 0\,,\,s\neq \mathbf{0}}\left \|I_{N} -\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right \|_{ F}. }$$
(6)

Our motivation stems from the fact it appears from the existing literature on scalable frames that the set of all such frames is relatively small, e.g., see [17]. As a result, one is interested in scaling a frame in an optimal manner. For example, by minimizing the condition number of the scaled frame (4), or the gap of the spectrum of the scaled frame (5). Furthermore, one can try to find the relationship between the optimal solutions to these two problems with the measures of scalability introduced in [8], of which (1) is a typical example.

In addition, we investigate these optimization problems from a practical point of view: the existence of fast algorithms to produce optimal solutions. As such, we are naturally lead to consider these problems in the context of convex optimization. We recall that in such a setting one wants to solve for \(s^{{\ast}} =\mathop{ \mathrm{arg\,min}}\limits _{s}f(s)\) for a real convex function \(f: X \rightarrow \mathbb{R} \cup \{\infty \}\) defined on a convex set X. Using the convexity of f and X it follows that:

  1. 1.

    If s is a local minimum of f, then it is a global minimum.

  2. 2.

    The set of all (global) minima is convex.

  3. 3.

    If f is a strictly convex function and a minimum exists, then the minimum is unique.

In addition, the convexity of f and X allows the use of convex analysis to produce fast, efficient algorithmic solvers, we refer to [2] and the references therein for more details.

We point out that (4) is equivalent to (2) simply by the definition of condition number of a frame. However, the condition number function κ is not convex. As such, it is nontrivial to find the optimal solution of (4). However, κ is a quasiconvex function (see [1, Theorem  13.6] for a proof), meaning that its lower level sets form convex sets; that is, the set {X: κ(X) ≤ a} forms a convex set for any real a ≥ 0. See [12] and the references therein for a survey on some algorithms that can numerically solve certain quasiconvex problems. We refer to [19] for a survey of results on optimizing the condition number. But we note that, while minimizing the condition number κ is not a convex problem, an equivalent convex problem was considered in [18]. For comparison and completeness we state one of the main results of [18]. First, observe that if X is a symmetric positive semidefinite matrix, then its condition number is defined as

$$\displaystyle{\kappa (X) = \left \{\begin{array}{ll} \lambda _{\max }(X)/\lambda _{\min }(X)&\text{if }\lambda _{\min }(X)> 0, \\ \infty &\text{if }\lambda _{\min }(X) = 0\text{ and }\lambda _{\max }(X)> 0, \\ 0 &\text{if }X \equiv 0. \end{array} \right.}$$

In this setting, it was proved in [18] that the problem of minimizing the condition number is equivalent to solving another problem with convex programming.

Theorem 1 ([18], Theorem 3.1).

Let \(\varOmega \subseteq \mathscr{ S}^{N}\) be some nonempty closed convex subset of \(\mathscr{S}^{N}\) , the space of N × N symmetric matrices and let \(\mathscr{S}_{+}^{N}\) be the space of symmetric positive semidefinite N × N matrices. Then the problem of solving

$$\displaystyle{\kappa ^{{\ast}} =\inf \{\kappa (X): X \in \mathscr{ S}_{ +}^{N}\cap \varOmega \}}$$

is equivalent to the problem of solving

$$\displaystyle{ \lambda ^{{\ast}} =\inf \{\lambda _{\max }(X): X \in t\varOmega,\,t \geq 0,\,X\succeq I\}, }$$
(7)

that is, λ = κ .

The problem described by (7) can be restated as solving for optimal scalars {s i } satisfying

$$\displaystyle{ \min _{s_{i}\geq 0\,,\,s\neq \mathbf{0}}\left \{\lambda _{\max }\left (\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right )\,\left \vert \,\lambda _{\min }\left (\sum _{ i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right.\right ) \geq 1\right \}. }$$
(8)

Therefore, when we obtain numerical solutions to the condition number problem (4), we actually solve (8) and the theory of [19] guarantees that the optimal solutions to both problems are indeed equal.

Theorem 1 has an intuitive interpretation. Suppose κ(X) = κ . Then rescaling X by a positive scalar, t, will also scale its eigenvalues by the same factor 1∕t, thus leaving its condition number, κ(Xt), unchanged. Therefore, without loss of generality, we can assume that X is rescaled so that λ min(Xt) ≥ 1 which is imposed in the last condition of (7). Once we know that λ min(Xt) is at least 1 then minimizing the condition number of Xt is equivalent to minimizing λ max(Xt) so long as Xt ∈ Ω which is guaranteed by the first condition in (7).

The goal of this chapter is to investigate the relationship among the solutions to each of the optimization problems (4), (5), and (6). In addition, we shall investigate the behavior of the optimal solution to each of these problems vis-á-vis the projection of a non-scalable frame onto the set of scalable frames. We shall also describe a number of algorithms to solve some of these problems and compare some of the performances of these algorithms. Finally, we shall apply some of the results of frame scalability to the problem of reweighing a graph in such a way that the condition number of the resulting Laplacian is as small as possible. The chapter is organized as follow. In Section 2 we investigate the three problems stated above and compare their solutions, and in Section 3 we consider the application to finite graph reweighing.

2 Non-scalable Frames and Optimally Conditioned Scaled Frames

We begin by showing the relationship between the three formulations of this scalability problem. We shall first show the equivalence of these problems when a frame is exactly scalable, and present toy examples of the different solutions obtained when a frame is only approximately scalable.

Lemma 1.

Let F ={ f i } i=1 M be a frame in \(\mathbb{R}^{N}\) . Then the following statements are equivalent:

  1. (a)

    F ={ f i } i=1 M is a scalable frame.

  2. (b)

    Problem  (4) has a global minimum solution, s ={ s i }, with objective function value 1.

  3. (c)

    Problem  (5) has a global minimum solution, s ={ s i }, with objective function value 0.

  4. (d)

    Problem  (6) has a global minimum solution, s ={ s i }, with objective function value 0.

Proof.

Assume F is scalable with weights, {s i } i = 1 M. Then \(\widetilde{S} =\sum _{ i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}} = I_{N}\), and the largest and smallest eigenvalue of the scaled frame operator is 1,

$$\displaystyle{ \dfrac{\lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )} {\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )} = \dfrac{\lambda _{\max }\left (\widetilde{S}\right )} {\lambda _{\min }\left (\widetilde{S}\right )} = 1. }$$

Assume problem (4) has a global minimum solution, {s i } i = 1 M. As, λ max ≥ λ min, the feasible solution must result in λ max = λ min = A. Applying this feasible solution as a scaling of F, we have,

$$\displaystyle{ \widetilde{S} =\sum _{ i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}} = AI_{ N}. }$$

By normalizing the feasible solution by the square-root of A, we have the Parseval scaling,

$$\displaystyle{\{\tilde{s}_{i}\}_{i=1}^{M} = \left \{ \dfrac{1} {\sqrt{A}}s_{i}\right \}_{i=1}^{M}.}$$

We have just proved that (a) and (b) are equivalent.

Assume F is scalable with weights, {s i } i = 1 M. Then \(\widetilde{S} =\sum _{ i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}} = I_{N}\), and the difference between the largest and smallest eigenvalue of the scaled frame operator is 0,

$$\displaystyle{ \lambda _{\max }\left (\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right ) -\lambda _{\min }\left (\sum _{ i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right ) =\lambda _{\max }\left (\widetilde{S}\right ) -\lambda _{\min }\left (\widetilde{S}\right ) = 0. }$$

Additionally \(N = tr(I_{N}) =\sum _{ i=1}^{M}s_{i}^{2}\left \|f_{i}\right \|_{2}^{2}\) which shows that {s i } i = 1 M is a feasible solution for (5).

Assume problem (5) has a global minimum solution, {s i } i = 1 M. As, λ max ≥ λ min, the feasible solution must result in λ max = λ min = A. Applying this feasible solution as a scaling of F, we have,

$$\displaystyle{ \widetilde{S} =\sum _{ i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}} = AI_{ N}. }$$

But the feasibility condition \(\sum _{i=1}^{M}s_{i}^{2}\left \|f_{i}\right \|_{2}^{2} = N\) implies N = tr(AI N ), hence A = 1. We have just proved that (a) and (c) are equivalent.

Assume F is scalable with weights, {s i } i = 1 M. Then \(\widetilde{S} =\sum _{ i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}} = I_{N}\), and the objective function for (6) attains the global minimum,

$$\displaystyle{ \left \|I_{N} -\sum _{i=1}^{M}s_{ i}^{2}f_{ i}f_{i}^{{\ast}}\right \|_{ F} = \left \|I_{N} - I_{N}\right \|_{F} = 0. }$$

Assume problem (6) has a global minimum solution, {s i } i = 1 M, which occurs when \(\left \|I_{N} -\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right \|_{F} = 0\). This implies that \(\widetilde{S} =\sum _{ i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}} = I_{N}\), and we have a Parseval scaling. We have just proved that (a) and (d) are equivalent.

Remark 1.

Lemma 1 asserts that the problem of finding optimal scalings, {s i } i = 1 M, for a given scalable frame F = { f i } i = 1 M is equivalent to finding the absolute minimums of the following optimization problems:

  • \(\min _{s_{i}\geq 0\,,\,s\neq \mathbf{0}}\dfrac{\lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )} {\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )}\)

  • \(\min _{\begin{array}{c} s_{i} \geq 0\,,\,s\neq \mathbf{0} \\ \sum _{i=1}^{M}s_{i}^{2}\left \|f_{1}\right \|_{2}^{2} = N\end{array} }\lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )-\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right )\)

  • \(\min _{s_{i}\geq 0\,,\,s\neq \mathbf{0}}\left \|I_{N} -\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right \|_{F}\)

Lemma 1 is restrictive in that it requires the frame F = { f i } i = 1 M be scalable to state equivalence among problems, but there can be a wide variance in the solutions obtained when the frame is not scalable. Even nearly tight frames vary in initial feasible solutions. We briefly consider ɛ-tight frames and analyze the distance from the minimum possible objective function value.

Let F ɛ  = { g i } i = 1 M with \(\left \|g_{i}\right \|_{2} = 1\) for all i be an ɛ-tight frame such that,

$$\displaystyle{(1-\varepsilon )I_{N}\preceq \sum _{i=1}^{M}g_{ i}g_{i}^{{\ast}}\preceq (1+\varepsilon )I_{ N}.}$$

First considering the case in which the frame cannot be conditioned any further, so the optimal scaling weights are s i  = 1. Analyzing the solution produced by the three optimization methods, we see the difference in solutions produced.

$$\displaystyle{ \begin{array}{l} \dfrac{\lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}g_{i}g_{i}^{{\ast}}\right )} {\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}g_{i}g_{i}^{{\ast}}\right )} = \dfrac{\lambda _{\max }\left (\sum _{i=1}^{M}g_{i}g_{i}^{{\ast}}\right )} {\lambda _{\min }\left (\sum _{i=1}^{M}g_{i}g_{i}^{{\ast}}\right )} = \dfrac{1+\varepsilon } {1-\varepsilon } = 1 + \dfrac{2\varepsilon } {1-\varepsilon }. \\ \lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\right ) -\lambda _{\min }\left (\sum _{i=1}^{M}s_{i}^{2}g_{i}g_{i}^{{\ast}}\right ) = (1+\varepsilon ) - (1-\varepsilon ) = 2\varepsilon \\ \lambda _{\max }\left (\sum _{i=1}^{M}s_{i}^{2}g_{i}g_{i}^{{\ast}}\right ) =\lambda _{\max }\left (\sum _{i=1}^{M}g_{i}g_{i}^{{\ast}}\right ) = 1 +\varepsilon. \end{array} }$$

We lack the information necessary to give exact results for formulation (6), so we instead give an upper bound when s i  = 1.

$$\displaystyle{ \begin{array}{rcl} \left \|I_{N} -\sum _{i=1}^{M}s_{i}^{2}g_{i}g_{i}^{{\ast}}\right \|_{F}& = &\left \|I_{N} -\sum _{i=1}^{M}g_{i}g_{i}^{{\ast}}\right \|_{F} \\ & \leq &\sqrt{N}\left \|I_{N} -\sum _{i=1}^{M}g_{i}g_{i}^{{\ast}}\right \|_{2} \\ & \leq &\varepsilon \sqrt{N}.\end{array} }$$

It makes sense that we could enforce this constraint, as we could renormalize the frame elements by the reciprocal of the smallest eigenvalue of the frame operator. It is not true, though, that the scalings produced must be the same. Moreover, when not using the constraint on the smallest eigenvalue, the scalings can vary wildly.

Remark 2.

For general frames, the optimization problems (4)–(6) do not produce tight frames. However, they can be solved using special classes of convex optimization algorithms: problems (4) and (5) are solved by Semi-Definite Programs (SDP), whereas problem (6) is solved by a Quadratic Program (QP) – see [2] for details on SDPs and QPs. In the following we state these SDPs explicitly.

SDP 1 – Operator Norm Optimization:

$$\displaystyle{ (t^{1},s^{(1)}) = argmin_{\begin{array}{c} t,s_{ 1},\ldots,s_{M} \geq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}- tI_{N} - I_{N} \leq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}} + tI_{N} - I_{N} \geq 0 \end{array} }t }$$
(9)

This SDP implements the optimization problem (3). In turn, as showed by Cassaza and Chen in [5], the solution to this problem is also an optimizer of the condition number optimization problem (4). Conversely, assume s (∗) is a solution of (4). Let A = λ min ( i = 1 M s i 2 f i f i ) and B = λ max ( i = 1 M s i 2 f i f i ). Let \(r = \frac{2} {A+B}\). Then s (∗) = (rs i 2) i = 1 M is a solution of (9) and the optimum value of the optimization criterion is t 1 = rB − 1 = 1 − rA.

SDP 2 – Minimum Upper Frame Bound Optimization:

$$\displaystyle{ (t^{2},s^{(2)}) = argmin_{\begin{array}{c} t,s_{ 1},\ldots,s_{M} \geq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}- I_{N} \geq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}- tI_{N} \leq 0 \end{array} }t }$$
(10)

This SDP implements the optimization problem (8) which is as previously discussed, also produces the solution s (2) to (4). Conversely, assume s (∗) is a solution of (4). Let A = λ min ( i = 1 M s i 2 f i f i ) and B = λ max ( i = 1 M s i 2 f i f i ). Let \(r = \frac{1} {A}\). Then s (∗) = (rs i 2) i = 1 M is a solution of (10), and the optimum value of the optimization criterion is \(t^{2} = \frac{B} {A}\).

SDP 3 – Spectral Gap Optimization:

$$\displaystyle{ (t^{3},v^{3},s^{(3)}) = argmin_{\begin{array}{c} t,v,s_{ 1},\ldots,s_{M} \geq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}- tI_{N} \leq 0 \\ \sum _{i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}- vI_{N} \geq 0 \\ \sum _{i=1}^{M}s_{i}\left \|f_{i}\right \|_{2}^{2} = N \end{array} }t-v }$$
(11)

This SDP implements the optimization problem (5). As remarked earlier (5) is not equivalent to any of (3),(4), or (8). A spectral interpretation of these optimization problems is as follows. The SDP 1 (and implicitly (4) and (8)) scales the frame so that the largest and smallest eigenvalues of the scaled frame operator are equidistant and closest to value 1. The SDP 3 scales the frame so that the largest and smallest eigenvalues of the scaled frame operator are closest to one another while the average eigenvalue is set to 1. Equivalently, the solution to SDP 3 also minimizes the following criterion:

$$\displaystyle{\frac{\lambda _{max}(\tilde{S}) -\lambda _{min}(\tilde{S})} { \frac{1} {N}tr(\tilde{S})} }$$

where \(\tilde{S} =\sum _{ i=1}^{M}s_{i}^{2}f_{i}f_{i}^{{\ast}}\) is the scaled frame operator.

QP 4 – Frobenius Norm Optimization:

$$\displaystyle{ s^{(4)} = argmin_{\begin{array}{c} s_{ 1},\ldots,s_{M} \geq 0 \end{array} }\sum _{i,j=1}^{M}s_{ i}s_{j}\vert \langle f_{i},f_{j}\rangle \vert ^{2}-2\sum _{ i=1}^{M}s_{ i}^{2}\left \|f_{ i}\right \|_{2}^{2}+N }$$
(12)

This QP implements the optimization problem (6).

Example 1.

Consider the 5-element frame, \(X \subseteq \mathbb{R}^{3}\), generated such that each coordinate is a random integer from 0 to 5.

$$\displaystyle{X = \left [\begin{array}{ccccc} 2&4&1&4&4\\ 3 &1 &2 &0 &2 \\ 1&4&3&5&2 \end{array} \right ]}$$

We then numerically compute X κ , X g , X F , which are the rescaled frames that minimize problems SDP 1, SDP 3, and QP 4, respectively. That is, X κ is the rescaled frame, X κ  = { s i f i }, such that s  = { s i } is the minimizer to Problem (3), which also minimizes the frame condition number, κ. Similarly, X g is rescaled to minimize the eigenvalue gap λ maxλ min while the average eigenvalue is 1, and X F is rescaled to minimize Frobenius distance to the identity matrix.

In our numerical implementation minimizing condition number, we used the CVX toolbox in MATLAB [11] which is a solver for convex optimization problems.

Let s κ , s g , and s F denote the scaling vectors that determine the frames X κ , X g , and X F , respectively. That is, X κ  = S κ 1∕2 X where S κ is the diagonal matrix with values given by s κ , and so on. We obtained scalings

$$\displaystyle{\begin{array}{r l l l l l} s_{\kappa } =&[0.0187,&0,&0.0591,&0.0122,&0.0242], \\ s_{g} =&[0.0875,&0,&0.0398,&0.0297,&0], \\ s_{F} =&[0.0520,&0,&0.0066,&0.0177,&0].\end{array} }$$

The results comparing each of the four frames are summarized in Table 1. Observe that each of the three methods can produce widely varying spectra.

Table 1 Comparisons of extreme eigenvalues, condition number, relative spectral gap, Frobenius distance to identity, and the operator norm distance to identity for the non-scalable frame X and its rescaled versions that minimize Problems (4)–(6).

We now demonstrate special conditions in which a frame’s condition number can be decreased using matrix perturbation theory.

Lemma 2 (Weyl’s Inequality, [23, Corollary 4.9]).

Let A be a Hermitian matrix with real eigenvalues {λ i (A)} i=1 d and let B be a Hermitian matrix of the same size as A with eigenvalues {λ i (B)} i=1 d . Then for any i = 1,…,d we have

$$\displaystyle{\lambda _{i}(A + B) \in [\lambda _{i}(A) +\lambda _{1}(B),\lambda _{i}(A) +\lambda _{d}(B)].}$$

An immediate corollary of Weyl’s inequality tells us that perturbing a matrix by a positive semidefinite matrix will cause the eigenvalues to not decrease.

Corollary 1.

Let A be a Hermitian matrix with real eigenvalues {λ i (A)} i=1 d and let B⪰0 be Hermitian and of the same size of A. Then for any i = 1,…,d, we have λ i (A) ≤λ i (A + B). The inequality is strict if B ≻ 0 is positive definite.

Lemma 3.

Let f be an eigenvector of A with associated eigenvalue λ. Let B be a matrix of the same size as A with the property that Bf = 0. Then f is an eigenvector of A + B with eigenvalue λ.

Lemma 4 ([24, Section 1.3]).

Let A and B be two N × N Hermitian matrices of same size. Then for any i = 1,…,N, the mapping t ↦ λ i (A + tB) is Lipschitz continuous with Lipschitz constant \(\left \|B\right \|_{2}\).

Corollary 2.

Let A be an N × N Hermitian matrix with simple spectrum and minimum eigengap δ > 0, i.e.,

$$\displaystyle{\delta =\min _{i\neq j}\vert \lambda _{i} -\lambda _{j}\vert.}$$

Let B be a nonnegative Hermitian matrix of same size as A. Then the mappings t ↦ λ i (A + tB) are interlacing:

$$\displaystyle{\lambda _{1}(A) \leq \lambda _{1}(A + tB) \leq \lambda _{2}(A) \leq \lambda _{2}(A + tB) \leq \cdots \leq \lambda _{N-1}(A + tB) \leq \lambda _{N}(A) \leq \lambda _{N}(A + tB)}$$

for \(t \in (0, \frac{\delta } {\left \|B\right \| _{ 2}} )\).

The following theorem gives conditions in which we can guarantee that the condition number of frame can be reduced.

Theorem 2.

Let \(F =\{ f_{i}\}_{i=1}^{m} \subseteq \mathbb{C}^{d}\) be a frame that is not tight and whose frame operator has simple spectrum with minimal eigengap δ > 0. Suppose that there exists some index k such that f k is orthogonal to the eigenspace corresponding to λ max (FF ) and not orthogonal to the eigenspace corresponding to λ min (FF ). Then there exists a rescaled frame \(\tilde{F} =\{ s_{i}f_{i}\}_{i=1}^{m}\) satisfying \(\kappa (\tilde{F}) <\kappa (F)\) . In particular, one scaling that decreases the condition number is

$$\displaystyle{s_{i} = \left \{\begin{array}{l l} \frac{m} {m-1+\sqrt{1+\gamma }},&\text{ for }i\neq k \\ \frac{m\sqrt{1+\gamma }} {m-1+\sqrt{1+\gamma }},&\text{ for }i = k \end{array} \right.}$$

for \(\gamma \in (0,\delta \left \|f_{k}\right \|^{-2})\).

Proof.

Let f k denote the frame element as described in the assumptions in the statement of the theorem. For γ ∈ (0, δ), consider the frame operator HH  = FF +γ f k f k which corresponds to the rescaled frame of F where each scale s i  = 1 except for \(s_{k} = \sqrt{1+\gamma }\). The matrix f k f k is Hermitian and positive semidefinite so by Corollary 1, we have λ i (FF ) ≤ λ i (HH ) for every i = 1, , N. Then by Corollary 2, the eigenvalues of the frame operator HH satisfy the following interlacing property:

$$\displaystyle{\lambda _{1}(FF^{{\ast}}) \leq \lambda _{ 1}(HH^{{\ast}}) \leq \lambda _{ 2}(FF^{{\ast}}) \leq \lambda _{ 2}(HH^{{\ast}}) \leq \cdots \leq \lambda _{ N}(FF^{{\ast}}) =\lambda _{ N}(HH^{{\ast}}),}$$

where the last equality follows from Lemma 3 and the fact that f k is orthogonal to the eigenspace corresponding to λ N (FF ).

We can now compute

$$\displaystyle{\kappa (FF^{{\ast}}) = \frac{\lambda _{N}(FF^{{\ast}})} {\lambda _{1}(FF^{{\ast}})} \geq \frac{\lambda _{N}(HH^{{\ast}})} {\lambda _{1}(HH^{{\ast}})} =\kappa (HH^{{\ast}}).}$$

Finally, we renormalize the scales {s i } by the constant factor \(m(m - 1 + \sqrt{1+\gamma })^{-1}\) to preserve the property that i = 1 m s i  = m. This renormalization scales all eigenvalues by the same factor which leaves the condition number unchanged. The frame

$$\displaystyle{\tilde{F} = \frac{m} {m - 1 + \sqrt{1+\gamma }}H}$$

is the frame described in the statement of the theorem, which concludes the proof.

Remark 3.

Having discussed the equivalence between the formulations above, we have seen that they do not necessarily produce similar solutions. This brings the question of which formulation we should use in general, to the forefront. One could answer this question by seeking a metric that best describes the distance of a frame to the set of tight frames. This is similar to the Paulsen problem [3], in that, after we have solved one of the formulations above, we produce a scaling and subsequent new frame and wish to determine the distance of this new frame to the canonical Parseval frame associated with our original frame. In [8], the question of distance to Parseval frames was generalized to include frames that could be made tight with a diagonal scaling, resulting in the distance between a frame and the set of scalable frames:

$$\displaystyle{ d_{F} =\min _{\varPsi \in \mathscr{SC}(M,N)}\|F -\varPsi \|_{F}. }$$
(13)

However, due to the fact that the topology of the set of scalable frames \(\mathscr{SC}(M,N)\) is not yet well-understood, computing d F is almost impossible for a non-scalable frame. A source of future work involves finding bound on d F using the optimal solutions to the three problems we stated above to analyze and produce bounds on the minimum distance.

3 Minimizing Condition Number of Graphs

In this section we outline how to apply and generalize the optimization problems from Section 2 in the setting of (finite) graph Laplacians. This task is not as simply as directly applying the condition number minimization problem (4), and the others, with graph Laplacian operators.

Recall that any finite graph has a corresponding positive semidefinite Laplacian matrix with eigenvalues {λ k } k = 0 N−1 and eigenvectors {f k } k = 0 N−1. Further any graph has smallest eigenvalue λ = 0 with multiplicity equal to number of connected components in the graph with eigenvalues equal to constant functions supported on those connected components. Because any Laplacian’s smallest eigenvalue equals 0, its condition number κ(L) is undefined. For simplicity, let us assume that all graphs in this section are connected and hence 0 = λ 0 < λ 1 ≤ λ 2 ≤ ⋯ ≤ λ N−1. Suppose we restricted the Laplacian operator to the (N − 1)-dimensional space spanned by the eigenvectors f 1, , f N−1. Then this new operator, call it L 0, has eigenvalues λ 1, , λ N−1 which are all strictly positive. Now, κ(L 0), the condition number of L 0 is a well-defined number.

Recall that the complete graph on N vertices, K N , is the most connected a graph on N vertices can be since one can traverse from any two vertices on precisely one edge. It is the only graph that has all nonzero eigenvalues equal, i.e., λ 0 = 0 and λ 1 = λ 2 = ⋯ = λ N−1 = N − 1. This graph achieves the highest possible algebraic connectivity, λ 1, of a graph on N vertices. If we create L 0 by projecting the Laplacian of K N onto the N − 1-dimensional space spanned by the eigenvectors corresponding with nonzero eigenvalue, then L 0 equals NI N−1, that is the (N − 1) × (N − 1) identity matrix times N.

Lemma 5.

Let G be a connected graph with eigenvalues {λ k } k=0 N−1 and eigenvectors {f k } k=0 N−1 of the graph Laplacian L. Let \(\tilde{F} = [f_{1}\,f_{2}\,\cdots \,f_{N-1}]\) be the N × (N − 1) matrix of eigenvectors excluding the constant vector f 0 . Then the (N − 1) × (N − 1) matrix

$$\displaystyle{ L_{0} =\tilde{ F}^{{\ast}}L\tilde{F} }$$
(14)

has eigenvalues {λ k } k=1 N−1 and associated orthonormal eigenvectors \(\{\tilde{F}^{{\ast}}f_{k}\}_{k=1}^{N-1}\).

Proof.

We first show that \(\{\tilde{F}^{{\ast}}f_{k}\}_{k=1}^{N-1}\) are eigenvectors to L 0 with eigenvalues λ k . For any k = 1, , N − 1 we have

$$\displaystyle{L_{0}\tilde{F}^{{\ast}}f_{ k} =\tilde{ F}^{{\ast}}L\tilde{F}\tilde{F}^{{\ast}}f_{ k}.}$$

But since \(\tilde{F}\) is an orthonormal basis for the eigenspace that its vectors span, then \(\tilde{F}\tilde{F}^{{\ast}}\) is simply the orthogonal projection onto the eigenspace spanned by {f 1, , f N−1}. That is, for any vector f, we have \(\tilde{F}\tilde{F}^{{\ast}}f = f -\langle f,f_{0}\rangle f_{0}\), which is simply the function f minus its mean value. For each k = 1, , N − 1, the eigenvectors f k have zero mean, i.e., 〈f k , f 0〉 = 0. Hence \(\tilde{F}\tilde{F}^{{\ast}}f_{k} = f_{k}\) and therefore

$$\displaystyle{L_{0}\tilde{F}^{{\ast}}f_{ k} =\tilde{ F}^{{\ast}}Lf_{ k} =\tilde{ F}^{{\ast}}(\lambda _{ k}f_{k}) =\lambda _{k}\tilde{F}^{{\ast}}f_{ k}.}$$

The orthonormality of the eigenvectors \(\{\tilde{F}^{{\ast}}f_{k}\}_{k=1}^{N-1}\) follows directly from the orthonormality of {f k } k = 0 N−1 and the computation

$$\displaystyle{\langle \tilde{F}^{{\ast}}f_{ k},\tilde{F}^{{\ast}}f_{ j}\rangle = (\tilde{F}^{{\ast}}f_{ k})^{{\ast}}\tilde{F}^{{\ast}}f_{ j} = f_{k}^{{\ast}}\tilde{F}\tilde{F}^{{\ast}}f_{ j} = f_{k}^{{\ast}}f_{ j} =\delta (k,j).}$$

Unlike the Laplacian, the operator in (14) is full rank and its rank equals the rank of the Laplacian. We denote it L 0 because it behaves as the Laplacian after the projection of the function onto the zeroth eigenspace is removed.

For a general finite graph, the Laplacian can be written as the sum of rank-one matrices L =  i = 1 m v i v i where v i is the i’th column in the incidence matrix B associated with the i’th edge in the graph and m is the total number of edges in the graph. Thus, the Laplacian can be formed by the product L = BB . The columns of the incidence matrix, B, as vectors in \(\mathbb{R}^{N}\) do not form a frame; B has rank N − 1. However, the restriction B to the (N − 1)-dimensional space spanned by f 1, . , f N−1, call it B 0, is a frame in that space. Then the methods of Section 2 do apply to the frame B 0 with corresponding frame operator L 0 = B 0 B 0 . Therefore the operator L 0 can also be written as one matrix multiplication \(L_{0} = (\tilde{F}^{{\ast}}B)(\tilde{F}^{{\ast}}B)^{{\ast}}\). For other related results on graphs and frames we refer to [22]. We seek scalars s i  ≥ 1 so that the rescaled frame \(\{s_{i}\tilde{F}^{{\ast}}v_{i}\}_{i=1}^{m}\) is tight or as close to tight as possible. In terms of matrices, we seek a nonnegative diagonal matrix \(X =\mathop{ \mathrm{diag}}\nolimits (s_{i})\) so that \(\tilde{L}_{0}:=\tilde{ F}^{{\ast}}BX^{2}B^{{\ast}}\tilde{F}\) has minimal condition number. The resulting graph Laplacian, denoted \(\tilde{L}_{\kappa } = BX^{2}B^{{\ast}}\), is the operator with minimal condition number, \(\tilde{L}_{0}\), without the projection onto (N − 1) eigenspaces, thus acting on the entire N-dimensional space. One can interpret this problem as rescaling weights of graph edges to not only make \(\tilde{L_{0}}\) as close as possible to the (N − 1)-identity matrix but also make the N × N Laplacian, \(\tilde{L}\), as close as possible to the Laplacian of the complete graph K N .

We present the pseudocode for the algorithm, GraphCondition, that produces \(\tilde{L}_{\kappa }\), the Laplacian of the graph that minimizes the condition number of L.

L κ =GraphCondition(L, F, B)

where L is the Laplacian matrix of the graph G,

F is the N × N eigenvector matrix of L,

B is the incidence matrix of L.

  1. 1.

    Set \(\tilde{F} = F(:,2: N)\).

  2. 2.

    Use cvx to solve for X that minimizes \(\lambda _{\max }(\tilde{F}^{{\ast}}BX^{2}B^{{\ast}}\tilde{F})\).

    subject to: X⪰0 is diagonal, \(\mathop{\mathrm{trace}}\nolimits (X) \geq t \geq 0\), and \(\tilde{F}^{{\ast}}BX^{2}B^{{\ast}}\tilde{F}\succeq I\).

  3. 3.

    Create L κ  = BX 2 B .

Example 2.

We consider the barbell graph G which consists of two complete graphs on 5 vertices that are connected by exactly one edge. The Laplacian for G has eigenvalues λ 1 ≈ 0. 2984 and λ 9 ≈ 6. 7016, thus giving a condition number of κ(G) ≈ 22. 45. We rescaled the edges via the GraphCondition algorithm and obtained a rescaled weighted graph \(\tilde{G}_{\kappa }\) which has eigenvalues λ 1 ≈ 0. 3900 and λ 10 ≈ 6. 991, thus giving a condition number \(\kappa (\tilde{G}_{\kappa }) \approx 17.9443\).

Both graphs, G and \(\tilde{G}_{\kappa }\), are shown in Figure 1. The edge bridging the two complete clusters is assigned the highest weight of 1.8473. All other edges emanating from those two vertices are assigned the smallest weights of 0.7389. All other edges not connected to either of the two “bridge” vertices are assigned a weight of 1.1019.

Fig. 1
figure 1

Top: The barbell graph G. Bottom: The conditioned graph with rescaled weights that minimizes the condition number. The width of the edges is drawn to be proportional to the weight assigned to that edge.

We show in the following example that the scaling coefficients {s i } i = 1 m that minimize the condition number of a graph are not necessarily unique.

Example 3.

Consider the graph G complete graph on four nodes with the edge (3, 4) removed. Then G was rescaled and conditioned via GraphCondition; both graphs are shown in Figure 2. The original Laplacian, L, and the rescaled conditioned Laplacian, \(\tilde{L}_{\kappa }\), produced by the GraphCondition algorithm are given as

$$\displaystyle{L = \left [\begin{array}{cccc} 3 & - 1& - 1& - 1\\ - 1 & 3 & - 1 & - 1 \\ - 1& - 1& 2 & 0\\ - 1 & - 1 & 0 & 2 \end{array} \right ],\quad \tilde{L}_{\kappa } \approx \left [\begin{array}{cccc} 2.8406 & - 0.6812& - 1.0797& - 1.0797\\ - 0.6812 & 2.8406 & - 1.0797 & - 1.0797 \\ - 1.0797& - 1.0797& 2.1594 & 0\\ - 1.0797 & - 1.0797 & 0 & 2.1594 \end{array} \right ],}$$

with spectra

$$\displaystyle{\sigma (L) =\{ 0,2,4,4\},\quad \sigma (\tilde{L}_{\kappa }) =\{ 0,2.1594,3.5218,4.3188\}.}$$

Both Laplacians have a condition number \(\kappa (L) =\kappa (\tilde{L}_{\kappa }) = 2\) which shows that the scaling of edges that minimize condition number are not necessarily unique.

Fig. 2
figure 2

The unweighted graph G (left) and its rescaled version \(\tilde{G}_{\kappa }\) (right) yet both graphs have a condition number equal to 2.

We prove that the GraphCondition algorithm will not disconnect a connected graph.

Proposition 1.

Let G = G(V,E,ω) be a connected graph and let \(\tilde{G}_{\kappa } =\tilde{ G}_{\kappa }(V,\tilde{E},\tilde{\omega })\) be the rescaled version of G that minimizes graph condition number. Then \(\tilde{G}_{\kappa }\) is also a connected graph.

Proof.

Let κ 0: = κ(G) ≥ 1 and suppose that \(\tilde{G}_{\kappa }\) is disconnected. This implies that \(\tilde{G}_{\kappa }\) has eigenvalue 0 with multiplicity at least 2 (one for each of its connected components). This violates the condition \(\tilde{F}^{{\ast}}BX^{2}B^{{\ast}}\tilde{F}\succeq I\) in the GraphCondition algorithm, which yields the unique minimizer.

We next consider the analogue of minimizing the spectral gap, λ N−1λ 1, for graphs. Just as before with condition number, we create the positive definite matrix L 0 and its incidence matrix, B 0, and minimize its spectral gap by the methods in Section 2 to minimize problem (5). We denote the rescaled graph that minimizes the spectral gap by \(\tilde{G}_{g}\).

Example 4.

We present numerical results of each of the graph rescaling techniques for the barbell graph shown in Figure 1. Each of the rescaled graphs is pictured in Figure 3 and numerical data is summarized in Table 2.

Fig. 3
figure 3

From top to bottom: \(\tilde{G}_{\kappa }\) and \(\tilde{G}_{g}\), which minimize the condition number and spectral gap, respectively.

Table 2 Comparison of condition number and spectral gap of the barbell graph, G, shown in Figure 1 and its rescaled versions, respectively.

As discussed in the motivation of this section, reducing the condition number of a graph makes the graph more “complete,” that is, more like the complete graph in terms of its spectrum. Since the algebraic connectivity λ 1 is as great as possible, it is the only graph for which λ 1 = λ N−1, the graph is the most connected a graph can possibly be, and as such the distance between any two points is minimal. As previously discussed, the effective resistance is a natural metric on graphs and one can compute that for any two distinct vertices, i and j, on the complete graph on N vertices we have

$$\displaystyle\begin{array}{rcl} R(i,j)& =& \sum _{k=1}^{N-1}\frac{1} {\lambda _{k}}\left (f_{k}(i) - f_{k}(j)\right )^{2} = \frac{1} {N}\sum _{k=1}^{N-1}\left (f_{ k}(i) - f_{k}(j)\right )^{2} {}\\ & =& \frac{1} {N}(e_{i} - e_{j})^{{\ast}}FF^{{\ast}}(e_{ i} - e_{j}) = \frac{1} {N}(e_{i} - e_{j})^{{\ast}}(e_{ i} - e_{j}) {}\\ & =& \frac{1} {N}\left \|e_{i} - e_{j}\right \|^{2} = \frac{2} {N}. {}\\ \end{array}$$

Conjecture 1.

The process of conditioning a graph reduces the average resistance between any two vertices on the graph.

The intuition behind Conjecture 1 can be motivated by studying the quantity k = 1 N−11∕λ k . Consider a sequence of positive numbers {a k } k = 1 N with average \(\bar{a} = 1/N\sum _{k=1}^{N}a_{k}\). Then since the function h(t) = 1∕t is continuous and convex on the set of positive numbers, it is also midpoint convex on that set, i.e.,

$$\displaystyle{\frac{N} {\bar{a}} = Nh(\bar{a}) \leq \sum _{k=1}^{N}h(a_{ k}) =\sum _{ k=1}^{N} \frac{1} {a_{k}}.}$$

With this fact, let {λ k } k = 1 N−1 denote the eigenvalues of connected graph G and \(\{\tilde{\lambda }_{k}\}_{k=0}^{N-1}\) denote the eigenvalues of the conditioned graph \(\tilde{G}_{\kappa }\), both satisfying \(\bar{\lambda }= 1/N\sum _{k=1}^{N-1}\lambda _{k} = 1/N\sum _{k=1}^{N-1}\tilde{\lambda }_{k}\). Since \(\tilde{G}_{\kappa }\) is better conditioned than G, then \(\left \|\sum _{k=1}^{N-1}\tilde{\lambda }_{k}-\bar{\lambda }\right \| \leq \left \|\sum _{k=1}^{N-1}\lambda _{k}-\bar{\lambda }\right \|\). In other words, the eigenvalues \(\{\tilde{\lambda }_{k}\}_{k=1}^{N-1}\) are closer to the average \(\bar{\lambda }\) than the eigenvalues {λ k } k = 1 N−1 are. Hence

$$\displaystyle{ \sum _{k=1}^{N-1}\frac{1} {\tilde{\lambda }_{k}} \leq \sum _{k=1}^{N-1}\frac{1} {\lambda _{k}}. }$$
(15)

Equation (15) almost resembles the effective resistance R(i, j) =  k = 1 N−11∕λ k (f k (i) − f k (j))2 except for the term (f k (i) − f k (j))2. This term will be difficult to account for since little is known about the eigenvectors of \(\tilde{G}_{\kappa }\). Analysis of eigenvectors of perturbed matrices is a widely open area of research and results are very limited, see [9, 14, 23, 24].

We remark that Conjecture 1 claims that conditioning a graph will reduce the average effective resistance between points; it is not true that the resistance between all points will be reduced. If the weight on edge (i, j) is reduced, then its effective resistance between points i and j is increased. Since we impose that the trace of the Laplacians be preserved, if any edge weights are increased, then by conservation at least one other edge’s weight must be decreased. The vertex pairs for those edges will then have an increased effective resistance between them.

While we lack the theoretical justification, numerical simulations support Conjecture 1 and this is a source of future work.

The authors of [13] approach a similar way. They propose using convex optimization to minimize the total effective resistance of the graph,

$$\displaystyle{R_{tot} =\sum _{ i,j=1}^{N}R(i,j).}$$

They show that the optimization problem is related to the problem of reweighting edges to maximize the algebraic connectivity λ 1.