Geometric means of structured matrices

Bini, Dario A.; Iannazzo, Bruno; Jeuris, Ben; Vandebril, Raf

doi:10.1007/s10543-013-0450-4

Geometric means of structured matrices

Published: 16 October 2013

Volume 54, pages 55–83, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

BIT Numerical Mathematics Aims and scope Submit manuscript

Geometric means of structured matrices

Download PDF

Dario A. Bini¹,
Bruno Iannazzo²,
Ben Jeuris³ &
…
Raf Vandebril³

534 Accesses
17 Citations
Explore all metrics

Abstract

The geometric mean of positive definite matrices is usually identified with the Karcher mean, which possesses all properties—generalized from the scalar case—a geometric mean is expected to satisfy. Unfortunately, the Karcher mean is typically not structure preserving, and destroys, e.g., Toeplitz and band structures, which emerge in many applications. For this reason, the Karcher mean is not always recommended for modeling averages of structured matrices. In this article a new definition of a geometric mean for structured matrices is introduced, its properties are outlined, algorithms for its computation, and numerical experiments are provided. In the Toeplitz case an existing mean based on the Kähler metric is analyzed for comparison.

The expanding universe of the geometric mean

Article Open access 07 May 2024

Following the Trail of the Operator Geometric Mean

Computational aspects of the geometric mean of two matrices: a survey

Article Open access 04 September 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The wish to generalize the concept of a geometric mean to positive definite (positive for short) matrices and, on the other hand, the need to average quantities expressed by positive matrices in certain applications have led to the definition and the study of the Karcher mean [8, 9, 33].

Without describing all facets, one can consider the set of n×n positive matrices, denoted by $\mathcal{P}_{n}$, as a manifold [1], in particular, there is a diffeomorphism from $\mathcal{P}_{n}$ to $\mathbb {R}^{n^{2}}$. In each point of $A\in\mathcal{P}_{n}$ one can define the tangent space $T_{A}\mathcal{P}_{n}$, which can be identified with the space of Hermitian matrices. The Karcher mean can now be defined in terms of a Riemannian geometry defined on $\mathcal{P}_{n}$ and induced by the inner product

$$ g_A(X,Y):=\operatorname{tr} \bigl(A^{-1}XA^{-1}Y \bigr),\quad X,Y\in T_A\mathcal{P}_n $$

(1.1)

on the tangent space $T_{A}\mathcal{P}_{n}$. This inner product g _A makes $\mathcal{P}_{n}$ a complete Riemannian manifold with non-positive curvature and yields the following distance between two matrices $A,B\in\mathcal{P}_{n}$:

$$ \delta(A,B)={ \Biggl(\sum_{k=1}^n \log^2 \lambda_k \Biggr)^{1/2}}, $$

(1.2)

where λ ₁,…,λ _n, are the eigenvalues of A ⁻¹ B, which are positive numbers (for all the proofs see [8, Ch. 6]). The Karcher mean of a set of m positive matrices, $A_{1},\ldots,A_{m}\in\mathcal{P}_{n}$, is defined as the unique positive minimizer G(A ₁,…,A _m) of the function

$$ f(X;A_1,\ldots,A_m):=\sum _{j=1}^m \delta^2(X,A_j). $$

(1.3)

Since this mean minimizes the sum of squared intrinsic distances to each of the matrices A _j it is a barycenter of these matrices with respect to the aforementioned metric.

An important feature of the Karcher mean is that it possesses all the properties desired by a geometric mean, like the ten Ando-Li-Mathias (ALM) axioms [2]. For this reason, it is a viable tool in applications requiring some of these properties [7, 34]. A geometric mean should for instance be: permutation invariant, monotone, joint concave, and should satisfy the arithmetic-geometric-harmonic inequality (see [2] for the precise statements of the properties). In particular, one of the most characteristic properties of a geometric mean is its invariance under inversion:

$$ G\bigl(A_1^{-1},\ldots,A_m^{-1} \bigr)=G(A_1,\ldots,A_m)^{-1}. $$

(1.4)

Prior to having the proofs of all the properties of the Karcher mean, some of which are very elusive [10, 31], other definitions of a matrix geometric mean had been proposed [2, 12, 14, 35], even if nowadays there is large agreement in considering the Karcher mean as the “right” matrix geometric mean.

In certain applications, however, besides the positive definiteness, the data matrices have some further structure in the sense that they belong to some special subset $\mathcal{S}$, say a linear space. For instance, in the design and analysis of certain radar systems, the matrices to be averaged are correlation matrices, which are positive Toeplitz matrices or are positive Toeplitz block matrices with Toeplitz blocks [5, 6, 29, 41]. In these cases, one would like the geometric mean to belong to the same class $\mathcal{S}$ as the data. Unfortunately, the Karcher mean does not preserve many structures, in particular the Karcher mean of Toeplitz and/or band matrices is typically not of Toeplitz and/or band form anymore, as illustrated by the following simple example.

Example 1.1

Let $\mathcal{T}$ be the set of tridiagonal Toeplitz matrices and choose $A_{1},A_{2}\in\mathcal{T}$ where A ₁=I, and $A_{2}=\operatorname{tridiag}(1,2,1)$ is the matrix with 2’s on the main, and 1’s appearing on sub- and superdiagonals. We have A ₁ A ₂=A ₂ A ₁, thus the Karcher mean equals (A ₁ A ₂)^1/2. For n=3 we get

$$ (A_1A_2)^{1/2}=\frac{\sqrt{2}}{4} \left [ \begin{array}{c@{\quad}c@{\quad}c} \sqrt{2+\sqrt{2}}+2 & \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}}-2\\ \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}} & \sqrt{2}\sqrt{2-\sqrt{2}}\\ \sqrt{2+\sqrt{2}}-2 & \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}}+2 \\ \end{array} \right ] $$

(1.5)

which is neither tridiagonal nor Toeplitz.

In this paper we introduce the concept of a structured geometric mean of positive matrices in such a way that if $A_{1},\ldots,A_{m}\in\mathcal{S}$ also their mean belongs to $\mathcal{S}$. Given a subset $\mathcal{S}$ of $\mathcal{P}_{n}$ and matrices $A_{1},\ldots,A_{m}\in\mathcal{S}$, we say that $G\in\mathcal{S}$ is a structured geometric mean with respect to $\mathcal{S}$ of A ₁,…,A _n if the function f(X;A ₁,…,A _m) of (1.3) takes its minimum value over $\mathcal{S}$ at G. The set of all structured geometric means of A ₁,…,A _m (all minimizers of f) with respect to $\mathcal{S}$ is denoted by $G_{\mathcal{S}}=G_{\mathcal{S}}(A_{1},\ldots,A_{m})$.

We show that if $\mathcal{S}$ is closed (and nonempty) then $G_{\mathcal{S}}$ is nonempty and the matrices $G\in G_{\mathcal{S}}$ satisfy most of the ALM axioms in a suitably adjusted form. For instance, the invariance under inversion property (1.4) turns into

$$G_{\mathcal{S}}(A_1,\ldots,A_m)=G_{\mathcal{S}^{-1}} \bigl(A_1^{-1},\ldots,A_m^{-1} \bigr)^{-1}, $$

where for a set $\mathcal{U}\subseteq\mathcal{P}_{n}$ we denote $\mathcal{U}^{-1}= \{X^{-1}: X\in\mathcal{U}\}$. That is, the inverse of any structured geometric mean of the matrices $A_{1},\ldots,A_{m}\in\mathcal{S}$ with respect to $\mathcal{S}$ coincides with a structured mean of the inverses $A_{1}^{-1},\ldots, A_{m}^{-1}$ with respect to the set $\mathcal{S}^{-1}$ where these inverses reside.

Moreover, we show that, in many interesting cases, structured geometric means can be characterized in terms of the positive solutions of a suitable vector equation and provide algorithms for their computation.

In the Toeplitz case we also consider a different approach, where the mean is defined as a barycenter for a suitable metric on the manifold [4]. We analyze this barycenter and its properties in detail, obtaining an explicit expression in the real case and a quick algorithm in the complex case.

The article is organized as follows. In Sect. 2 the cost function (1.3) is examined with special focus on the existence of the minimizer over a closed set. The structured matrix mean itself is the subject of study in Sect. 3, where the theoretical properties it should satisfy are examined. Section 4 proposes two algorithms for computing a structured mean G in a linear space together with their convergence analysis. For one algorithm, it is shown that the convergence speed is independent of the condition number of the mean and is faster when the condition numbers of the matrices $A_{i}^{-1/2}GA^{-1/2}_{i}$ are smaller, for i=1,…,n. Because of its nature and its convergence properties, this algorithm can be viewed as the natural extension to the structured case of the Richardson-like algorithm introduced and analyzed in [11] for the computation of the Karcher mean of unstructured matrices. In Sect. 5, for Toeplitz matrices, a different structured matrix mean [4] as a barycenter is considered, and an algorithm for computing it is developed. Section 6 shows numerical experiments related to accuracy and speed for computing the structured matrix mean.

Here we recall some basic notation and properties that will be used in the rest of the paper. Given a matrix A, we define σ(A) the spectrum of A, that is, the set of all the eigenvalues of A, and ρ(A)=max_λ∈σ(A)|λ| the spectral radius of A. Moreover we denote by $\|A\|_{F}:=(\operatorname{trace}(A^{*}A))^{1/2}= (\sum_{i,j} |a_{ij}|^{2})^{1/2}$ the Euclidean (Frobenius) norm of A, and ∥A∥_s=ρ(A ^∗ A)^1/2 is the spectral norm. By A ^∗ we denote the transposed conjugate of A. Recall that for a positive matrix A there exists a unique positive solution to the equation X ²=A. This solution, denoted by A ^1/2, is called the square root of A [8]. Given a matrix $A\in\mathbb{C}^{n\times n}$, we use the $\operatorname {vec}$-operator to build $\operatorname{vec}(A)\in\mathbb{C}^{n^{2}}$, a long vector obtained by stacking the columns of A. We will use the Kronecker product ⊗ such that A⊗B is the block matrix whose (i,j)th block is defined as a _ij B. The vec operator and the Kronecker product interplay in the following way [23]

$$ \operatorname{vec}(ABC)=\bigl(C^T \otimes A\bigr) \operatorname{vec}(B). $$

(1.6)

Finally, we recall a natural partial order in $\mathcal{P}_{n}$ that will be used in the following: let A and B be positive, we write A≥B if the matrix A−B is semidefinite positive.

2 Existence of structured geometric means

In this section the existence of a structured geometric mean and its relation to the classical Karcher mean is studied. First some necessities are repeated.

2.1 Uniqueness of the Karcher mean for positive matrices

The Riemannian geometry on $\mathcal{P}_{n}$ given by the inner product (1.1) turns out to be complete and a parametrization of the geodesic joining two positive matrices A and B is known to be [8, 30]

$$ A\#_t B=A^{1/2} \bigl(A^{-1/2}BA^{-1/2} \bigr)^tA^{1/2}=A \bigl(A^{-1}B \bigr)^t,\quad t\in[0,1], $$

(2.1)

where the midpoint A#_1/2 B coincides with the geometric mean of the two matrices [24, 30].

Given a set of matrices $A_{1},\ldots,A_{m}\in\mathcal{P}_{n}$, the function f(X)=f(X;A ₁,…,A _m) in (1.3) is strictly geodesically convex, which means that for any two different matrices $X, Y\in\mathcal{P}_{n}$, we have

$$ f(X\#_t Y)< (1-t)f(X)+tf(Y),\quad 0<t<1. $$

(2.2)

This property follows from [8, Exercise 6.1.13], where it is stated that for m=1 the function f(X) is strictly geodesically convex. The case m>1 follows by summing up the m inequalities obtained by applying (2.2) to the functions f(X)=f(X;A _i), for i=1,…,m, respectively.

Geodesical convexity is a key ingredient for the proof of the existence of a unique minimizer of f over $\mathcal{P}_{n}$ given in [8, Ch. 6]. A different proof is obtained using the fact that $\mathcal{P}_{n}$, with the inner product (1.1), forms a Cartan–Hadamard manifold [16, 28, 30, 32], which is a Riemannian manifold, complete, simply connected and with non-positive sectional curvature everywhere. On such a Cartan–Hadamard manifold the Karcher mean (the so-called center-of-mass) exists and is unique [17, 27, 28].

The notion of geodesical convexity in $\mathcal{P}_{n}$ is different from the customary convexity in the Euclidean space where one requires that

$$f\bigl((1-t)X+tY\bigr)\le(1-t)f(X)+tf(Y),\quad t\in[0,1]. $$

In fact, the function f is not convex in the traditional sense as the following example shows.

Example 2.1

Consider the set made of the unique matrix A=1, and $x,y\in\mathbb{R}_{+}^{*}=\mathcal{P}_{1}$. We have f(x)=δ ²(x,A)=log²(x) which is not convex. On the other hand the function log²(x) is strictly geodesically convex and this can be shown by an elementary argument: in fact, it is continuous and for x≠y

$$\begin{aligned} \delta^2(\sqrt{xy},1) =&\log^2(\sqrt{xy})= \frac{1}{4} \bigl(\log^2x+\log^2y+2\log x\log y \bigr) \\ =&\frac{1}{2} \bigl(\log^2 x+\log^2y \bigr)-\frac{1}{4} (\log x-\log y )^2 \\ <&\frac{1}{2} \bigl(\log^2 x+\log^2 y \bigr)= \frac{1}{2} \bigl(\delta^2(x,1)+\delta^2(y,1) \bigr). \end{aligned}$$

Iterative selection of midpoints and a continuity argument completes the proof.

Since f is strictly geodesically convex, it can be proved that it has a unique minimizer over any closed, geodesically convex subset $\mathcal{S}$ of $\mathcal{P}_{n}$, where we say that a subset $\mathcal{S}\subseteq \mathcal{P}_{n}$ is geodesically convex if for any $X,Y\in\mathcal{S}$, the entire geodesic X#_t Y, t∈[0,1] belongs to $\mathcal{S}$. Indeed, if X ₁ and X ₂ were two different matrices in $\mathcal{S}$ where f takes its minimum, then from (2.2) it would follow that f(X ₁#_t X ₂)<f(X ₁)=f(X ₂) for any 0<t<1 which contradicts the assumption.

2.2 Existence of structured geometric means on a closed set

For a generic closed subset $\mathcal{U}$ of $\mathcal{P}_{n}$, which is not necessarily geodesically convex, we can prove the existence of a minimum point by using the fact that f(X) is continuous.

Theorem 2.1

Let $\mathcal{U}\subseteq\mathcal{P}_{n}$ be a closed subset. Then for any $A_{1},\ldots,A_{m}\in\mathcal{P}_{n}$ the function f(X)=f(X;A ₁,…,A _m) has a minimum in $\mathcal{U}$.

Proof

Consider the distance measures δ(X,Y), given in (1.2), which can be written also as δ(X,Y)=∥log(Y ^−1/2 XY ^−1/2)∥_F, and d(X,Y) (the Thompson metric [39]), given by

$$d(X,Y) = \bigl\Vert\log\bigl(Y^{-1/2} X Y^{-1/2}\bigr) \bigr\Vert_s. $$

From the inequality between the Frobenius and the spectral norm, we have

$$d(X,Y) \leq\delta(X,Y) \leq\sqrt{n} d(X,Y) $$

for all $X,Y\in\mathcal{P}_{n}$. We also define the order interval [X,Y] as

$$[X,Y] = \{ Z\in\mathcal{P}_n: X\leq Z\leq Y \}, $$

for all $X,Y\in\mathcal{P}_{n}$ with X≤Y [21]. The order interval [e ^−r X,e ^r X] turns out to be the closed ball of radius r>0 around X with respect to the Thompson metric, thus the order interval [e ^−r I,e ^r I] is a compact subset of $\mathcal{P}_{n}$.

Next, consider the set $Q_{t} = \{ X \in\mathcal{P}_{n}: f(X)\leq t \}$. Because of the continuity of f, each Q _t is closed in $\mathcal{P}_{n}$. The definition of f also guarantees the boundedness of Q _t for each t with respect to the Riemannian metric δ. Since we have the above inequality of the distance measures, this implies that each Q _t is also bounded with respect to the Thompson metric. Hence it is possible to find some r such that Q _t is contained in the closed ball of radius r around I with respect to the Thompson metric, that is Q _t⊂[e ^−r I,e ^r I]. Since Q _t is now a closed subset of a compact set (in $\mathcal{P}_{n}$), it is also compact in $\mathcal{P}_{n}$.

Finally, define $B_{t} = Q_{t} \cap\mathcal{U}$, which is a again compact subset in $\mathcal{P}_{n}$ (as the intersection of a compact and a closed set). The collection of all nonempty B _t is a descending family of nonempty, compact subsets of $\mathcal{P}_{n}$, which has a nonempty intersection. The elements in this intersection are the minimizers of f in $\mathcal{U}$. □

In general, uniqueness of the point where f(X) takes its minimum cannot be guaranteed. For instance, if both A and A ⁻¹ belong to $\mathcal{U}$ while I=A#_1/2 A ⁻¹ does not, then the function f ₁(X):=δ ²(X,A)+δ ²(X,A ⁻¹) reaches its minimum at a point $I\ne G \in\mathcal{U}$. Clearly, f ₁(G ⁻¹)=f ₁(G) and if G ⁻¹≠G belongs to $\mathcal{U}$, then we have at least two distinct points of minimum. A more concrete example is the following.

Example 2.2

Consider the 2×2 matrices A=I and , where a>1. Define the segment $\mathcal{U}= \{G(t)=A+t(B-A), t\in [0,1] \}$, which is closed and convex, but not geodesically convex. The function f(t)=δ ²(G(t),A)+δ ²(G(t),B) takes the form f(t)=log²((1−t)/a+t)+log²(a(1−t)+t)+log²((1−t)+t/a)+log²((1−t)+at) and is symmetric with respect to t=1/2. For a=200 the function has the graph shown in Fig. 1 with a local maximum at t=1/2 and two global minima close to the edges of the segment.

3 A theoretical exploration of the structured geometric mean

In this section we discuss the relation between the structured and generic geometric mean, together with the adaptation of the generic properties to the structured setting. We will discuss just the real case, so in this section, the set $\mathcal{P}_{n}$ stands for the manifold of real positive definite matrices whose tangent space is the set of real and symmetric matrices.

3.1 The geometric and structured geometric mean relation

The properties shown in Sect. 2 imply that a structured geometric mean with respect to $\mathcal{U}$, as defined in the introduction, always exists for any closed subset $\mathcal{U}$ of $\mathcal{P}_{n}$. In particular, this holds in the cases where $\mathcal{U}=\mathcal{S}\cap\mathcal{P}_{n}$ for any linear space $\mathcal{S}$ of matrices and also for $\mathcal{U}^{-1}:=\mathcal{S}^{-1}\cap \mathcal{P}_{n}$ where we define $\mathcal{S}^{-1}= \{A^{-1}: A\in\mathcal{S}, \det A\ne0 \}$. This captures a wide class of interesting structures emerging in applications, e.g., Toeplitz and band matrices, as well as their inverses. For simplicity we will restrict our analysis in the remainder of the article to the real case.

More general structures are given in terms of a parametrization $\sigma (t):\mathcal{V}\to \mathbb{R}^{n\times n}$, with σ a differentiable function defined in the open subset $\mathcal{V}$ of $\mathbb{R}^{q}$, which we will call the parameter space. The set $\mathcal{T}=\sigma(\mathcal{V})$ is the structure determined by σ. If σ is linear and $\mathcal{V}=\mathbb{R}^{q}$, then $\mathcal{T}$ is a linear space. Examples of sets $\mathcal{T}$ of interest which generally do not form a linear space are the set of matrices with a given displacement rank [13], the set of semiseparable [40], and quasiseparable matrices [19]. For an n×n symmetric Toeplitz matrix, a possible parametrization is given by

$$ \sigma(t)=\sigma\bigl([t_0,t_1, \ldots,t_{n-1}]\bigr)=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} t_0&t_1 &\ldots&t_{n-1}\\ t_1&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&t_1\\ t_{n-1}&\ldots&t_1&t_0 \end{array} \right ]. $$

(3.1)

For a band matrix, one can, e.g., just store the nonzero-elements in a long vector and map them onto their exact locations. In the following, given a closed set $\mathcal{T}$ we let $\mathcal{U}=\mathcal{T}\cap\mathcal{P}_{n}$.

In Example 2.2 we illustrated that the minimum of the cost function restricted to a closed subset $\mathcal{U}\subseteq \mathcal{P}_{n}$ is not necessarily unique. For this reason, we consider the structured geometric mean $G_{\mathcal{U}}=G_{\mathcal{U}}(A_{1},\ldots,A_{m})$ of $A_{1},\ldots,A_{m}\in\mathcal{U}$ as the set of matrices in $\mathcal{U}$ where the function f(X) attains its minimum. Formally speaking, for $A_{1},\ldots,A_{m}\in\mathcal{U}$, let $g\in\mathbb{R}^{q}$ be such that $\widehat{G}=\sigma(g)\in G_{\mathcal{U}}(A_{1},\ldots,A_{m})$, then

$$f\bigl(\sigma(g);A_1,\ldots,A_m\bigr)=\min _{t\in\mathbb{R}^q}f\bigl(\sigma (t);A_1,\ldots,A_m \bigr). $$

Since $\mathcal{U}\subseteq\mathcal{P}_{n}$, the minimum over $\mathcal{P}_{n}$ is less than or equal to the minimum over $\mathcal{U}$. In general it will often happen that $\widehat{G}\ne G(A_{1},\ldots,A_{m})$ like in (1.5).

3.2 Properties of the geometric mean conveyed to the structured mean setting

Some desired properties for a matrix geometric mean were stated by Ando, Li and Mathias in [2], of which the most noticeable are enlisted here. The other properties include joint concavity, the determinental identity, and continuity from above.

Consistency with scalars :: If A ₁,…,A _m commute, then
$$G(A_1,\ldots,A_m)=(A_1\cdots A_m)^{1/m}. $$
Permutation invariance :: For any permutation π of {1,…,m},
$$G(A_1,\ldots,A_m)=G (A_{\pi(1)}, \ldots,A_{\pi(m)} ). $$
Joint homogeneity :: $$G(\alpha_1 A_1,\alpha_2 A_2, \ldots ,\alpha_m A_m)=(\alpha_1\cdots \alpha_m)^{1/m}G(A_1,\ldots,A_m). $$
Monotonicity :: If $A_{i} \geq A'_{i}$, for i=1,…,m, then
$$G(A_1,\ldots,A_m) \geq G\bigl(A'_1, \ldots,A'_m\bigr). $$
Invariance under congruence :: For any nonsingular M,
$$G\bigl(M^*A_1M,\ldots,M^*A_mM\bigr)=M^*G(A_1, \ldots,A_m)M. $$
Invariance under inversion :: $$G(A_1,\ldots,A_m)^{-1}=G\bigl(A_1^{-1}, \ldots,A_m^{-1}\bigr). $$
Arithmetic-geometric-harmonic mean inequality :: $$\frac{1}{m}(A_1+\cdots+A_m)\ge G(A_1, \ldots,A_m)\ge m \bigl({A_1^{-1}+ \cdots+A_m^{-1}} \bigr)^{-1}. $$

Yet another property naturally desired of a geometric mean, but not required in the list of Ando, Li and Mathias, is the repetition invariance, that is, for any set of positive matrices $A_{1},\ldots,A_{m}\in\mathcal{P}_{n}$,

$$ G(A_1,\ldots,A_m,A_1, \ldots,A_m)=G(A_1,\ldots,A_m). $$

(3.2)

Now, we consider the properties of the structured geometric mean. Some properties such as the permutation invariance trivially hold, others should be restated. In fact, in the generic case the structures we consider are neither invariant under inversion nor under congruence. That is because if $A\in\mathcal{U}$ then it need not necessarily hold that $A^{-1}\in\mathcal{U}$ or $M^{*}AM\in\mathcal{U}$.

We start with the invariance under inversion as this is one of the most characteristic properties of the geometric mean. To this end we consider the set $\mathcal{T}^{-1}= \{T^{-1}: T\in\mathcal{T},~\det T\ne0 \}$ parametrized with the function σ ₋₁(t):=σ(t)⁻¹. Clearly, the intersection $\mathcal{U}$ of $\mathcal{T}$ with $\mathcal{P}_{n}$ yields always invertible matrices, so that $\mathcal{T}^{-1}\cap\mathcal{P}_{n}=\mathcal{U}^{-1}$.

According to our definition, the structured geometric mean of $A_{1}^{-1},\ldots,A_{m}^{-1}\in\mathcal{U}^{-1}$ is given by the set $G_{\mathcal{U}^{-1}}(A_{1}^{-1},\ldots,A_{m}^{-1})$. For any $\widetilde{G}\in G_{\mathcal{U}^{-1}}$, we have $\widetilde{G}=\sigma(\widetilde{g})^{-1}$ such that

$$f \bigl(\sigma (\widetilde{g} )^{-1};A_1^{-1}, \ldots ,A_m^{-1} \bigr)=\min_{t\in\mathbb{R}^q} f \bigl(\sigma(t)^{-1};A_1^{-1}, \ldots,A_m^{-1} \bigr). $$

Since δ(A,B) = δ(A ⁻¹,B ⁻¹), one gets $f(X;A_{1},\ldots,A_{m})\,{=}\,f(X^{-1};A_{1}^{-1},\ldots,A_{m}^{-1})$ so that

$$f \bigl(\sigma(\widetilde{g});A_1,\ldots,A_m \bigr)=\min _{t\in \mathbb{R}^q}f \bigl(\sigma(t);A_1,\ldots,A_m \bigr) $$

and thus $\widetilde{G}^{-1}\in G_{\mathcal{U}}(A_{1},\ldots,A_{m})$. Since $\widetilde{G}$ was chosen arbitrarily, and since $\mathcal{U}$ can be interchanged with $\mathcal{U}^{-1}$, we have the analogue of the invariance under inversion for the structured geometric mean:

$$ G_{\mathcal{U}}(A_1,\ldots,A_m)^{-1}=G_{\mathcal{U}^{-1}} \bigl(A_1^{-1},\ldots, A_m^{-1} \bigr). $$

(3.3)

In a similar manner we can restate the invariance under congruence in a structured style by defining, for any nonsingular M, the set $\mathcal{U}_{M}:=M^{*}\mathcal{U}M=\{M^{*}TM: T\in\mathcal{U}\}$. The invariance under congruence is then understood as

$$G_{\mathcal{U}_M}\bigl(M^*A_1M,\ldots,M^*A_mM \bigr)=M^*G_{\mathcal{U}}(A_1,\ldots , A_m)M. $$

Joint homogeneity, in order to be defined, requires that the set $\mathcal{T}$ satisfies the following property:

$$A\in\mathcal{T}\quad\Rightarrow\quad\alpha A\in\mathcal{T} $$

for any scalar α>0. This property clearly holds if $\mathcal{T}$ is a linear space or the set formed by the inverses of the nonsingular matrices of a linear space. For these sets, the joint homogeneity holds.

Repetition invariance holds true as well by (1.3), since

$$f(X;A_1,\ldots,A_m,A_1, \ldots,A_m)=2f(X;A_1,\ldots,A_m), $$

so the minimizers (over a subset) of the functions f(X;A ₁,…,A _m,A ₁,…,A _m) and f(X;A ₁,…,A _m) are the same.

Regarding the remaining properties, we observe that the consistency with scalars is violated, as Example 1.1 shows. Nevertheless, weaker consistency properties hold, such as idempotency, namely $G_{\mathcal{U}}(A,A,\ldots,A)=A$ for each structure $\mathcal{U}$ and $A\in\mathcal{U}$. Moreover, if the set $\mathcal{U}$ is closed and geodesically convex then

$$G_{\mathcal{U}}(A_1,\ldots,A_m)=G(A_1, \ldots,A_m), $$

so the geometric and structured geometric mean coincide. An interesting case of geodesically convex set is given by $\mathcal{U}=\mathcal{T}\cap \mathcal{P}_{n}$, when $\mathcal{T}$ is an algebra, i.e., a linear space closed under multiplication and inversion.

Finally, the properties related to the ordering of positive matrices such as monotonicity are not true as shown by the following numerical example.

Example 3.1

We consider the four Toeplitz matrices

$$\begin{aligned} &T_1= \left [\begin{array}{c@{\quad}c@{\quad}c} 1 & 1/2 & 1/2 \\ 1/2 & 1 & 1/2 \\ 1/2 & 1/2 & 1\\ \end{array} \right ] , \qquad T_2=T_1, \\ &T_3= \left [\begin{array}{c@{\quad}c@{\quad}c} 3/4 & 1/2 & 0 \\ 1/2 & 3/4 & 1/2 \\ 0 & 1/2 & 3/4 \\ \end{array} \right ] ,\qquad S= \left [\begin{array}{c@{\quad}c@{\quad}c} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 1\\ \end{array} \right ] , \end{aligned}$$

and, using the algorithms presented in the next sections, we compute a structured geometric mean G _ε of the three matrices T ₁,T ₂ and T ₃+εS for various ε≥0. The norm of G _ε−G ₀ becomes small as ε tends to 0 and we observe that G _ε−G ₀ is not positive (semi)definite, while T ₃+εS≥T ₃. This gives numerical evidence of the lack of monotonicity of a structured geometric mean. On the other hand, computing the arithmetic mean A of T ₁,T ₂ and T ₃, one observes also that the expected inequality A≥G ₀ does not hold in this case.

3.3 The structured mean as solution(s) of a vector equation

We start from the Karcher mean, which is obtained as the unique solution in $\mathcal{P}_{n}$ of the matrix equation

$$ \sum_{i=1}^m\log \bigl(XA_i^{-1} \bigr) = 0. $$

(3.4)

Equation (3.4) is obtained using the fact that f is differentiable and has a minimum at the Karcher mean. Thus the Karcher mean satisfies the condition ∇f _X=0, where $\nabla f_{X}=2X^{-1}\sum_{i=1}^{m} \log (XA_{i}^{-1} )$ denotes the (Euclidean) gradient of f with respect to X (see [26, 33]). We remark already that a different metric will be studied in Sect. 4.3.

In the general case, the restriction of f to a structure given by σ(t) is investigated. For any minimum g (with corresponding σ(g)) not located at the boundary of the parameter space, the gradient ∇(f∘σ)_t of the function with respect to t must be zero, so we are interested in the solutions of the vector equation ∇(f∘σ)_t=0.

From the chain rule of derivation, one obtains that

$$\nabla(f\circ\sigma)_t= \biggl(\sum_{i,j} \frac{\partial f(\sigma (t))}{\partial x_{i,j}}\frac{d \sigma_{i,j}(t)}{d t_s} \biggr)_{s=1,\ldots,q}=0 $$

which leads to the vector equation

$$ \sum_{i,j} \bigl(\varGamma\bigl(\sigma(t) \bigr) \bigr)_{i,j}\frac{d \sigma _{i,j}(t)}{d t_s}=0,\quad s=1,\ldots,q, $$

(3.5)

where $\varGamma(X):=\frac{1}{2}\nabla f_{X}$.

In the case where $\mathcal{T}$ is a linear space, the function σ(t) is linear and can be written in matrix form as

$$\operatorname{vec}\bigl(\sigma(t)\bigr)=Ut,\quad U\in\mathbb{R}^{n^2\times q}, $$

so that (3.5) turns into

$$ U^T\operatorname{vec}\bigl(\varGamma\bigl(\sigma(t) \bigr)\bigr)=0,\qquad\varGamma (X)=X^{-1}\sum _{i=1}^m \log \bigl(XA_i^{-1} \bigr). $$

(3.6)

If σ(t) is chosen to be orthogonal, i.e. such that U ^T U=I, then U ^T coincides with the Moore–Penrose inverse of U.

When $\mathcal{T}$ denotes the set of symmetric Toeplitz matrices, the parametrization (3.1) leads to a matrix U having orthogonal columns. In fact one has $U^{T}U=D=\operatorname{diag}(n,2(n-1),2(n-2),\ldots,2)$. In particular, for n=3 one has

For $\mathcal{T}$ being the set of symmetric tridiagonal matrices the parametrization

$$\sigma(t)=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} t_1&t_{n+1}\\ t_{n+1}&t_2&t_{n+2}\\ &\ddots&\ddots&\ddots\\ & &t_{2n-2}&t_{n-1}&t_{2n-1}\\ & & &t_{2n-1}&t_n \end{array} \right ] $$

also leads to a matrix U having orthogonal columns. Moreover, $U^{T}U=\operatorname{diag}(I_{n}, 2I_{n-1})$. For n=3, e.g., one has

4 Algorithms for structured geometric means in the linear case

We will give two algorithms for computing structured geometric means when they are characterized in terms of the solutions g of a vector equation, as, for instance, in the linear case.

We first provide a general definition of a class of algorithms based on a preconditioned functional iteration, then we specialize to two algorithms given by two different preconditioners.

The first, provided in Sect. 4.2 is derived by relying on the projection of the gradient with respect to the Euclidean scalar product. The second, presented in Sect. 4.3, is obtained through projection with respect to the Riemannian metric of $\mathcal{P}_{n}$ decribed in Sect. 1.

4.1 A preconditioned functional iteration and its convergence

Throughout this section we assume that $A_{1},\ldots,A_{m}\in\mathcal{U}$, where $\mathcal{U}=\mathcal{T}\cap\mathcal{P}_{n}$ and $\mathcal{T}$ is a linear space with a parametrization σ(t) such that $\operatorname{vec}(\sigma(t))=Ut$, and D=U ^T U.

The structured geometric mean $G_{\mathcal{U}}$ is defined as the set of minimizers of the function f(X;A ₁,…,A _m) over $\mathcal{U}$. These minimizers must be sought among the stationary points of the function f, that is, among the solutions to the vector equation (3.6).

Therefore, a way to design algorithms for computing structured means $G_{\mathcal{U}}$ is to apply numerical techniques to solve the vector equation (3.6). We consider a preconditioned Richardson-like iteration constructed in the spirit of [11]. Let V(X) be a nonsingular and sufficiently differentiable matrix function and define

$$ \begin{aligned} & \varphi(t)=t-\theta S(t),\qquad S(t)=V\bigl(\sigma(t)\bigr)^{-1}U^T \operatorname{vec} \bigl(\varGamma\bigl(\sigma(t)\bigr)\bigr), \\ & t_{\nu+1}=\varphi(t_\nu),\quad \nu=0,1,\ldots, \end{aligned} $$

(4.1)

where θ is a parameter introduced to enhance convergence, V(σ(t)) is a preconditioner and t ₀ is a given vector such that σ(t ₀) is positive. Observe that the fixed points of φ(t) are the solutions of the vector equation (3.6) and if convergent, the sequence t _ν converges to a solution of the vector equation (3.6).

In the following, given a matrix function f(X), where X=(x _i,j) and f(X) are n×n matrices, we denote by J _f(G) the n ²×n ² Jacobian matrix of $\operatorname{vec}(f(X))$ with respect to the variable $\operatorname{vec}(X)$ computed at X=G, similarly we denote J _f∘σ(t _G) the n ²×q Jacobian of the composed function $\operatorname{vec}(f(\sigma(t)))$ with respect to the variables (t ₁,…,t _q) at t=t _G. In this notation, the function in the subscript as well as the variable between parentheses specify if the derivatives are taken w.r.t. the matrix variable X or the vector variable t.

Observe that if V(σ(t)) is chosen as the Jacobian of $U^{T}\operatorname{vec}(\varGamma(\sigma(t)))$, then (4.1) coincides with Newton’s iteration.

If t _G is a solution of (3.6) and if t _ν is sufficiently near to t _G, then

$$t_{\nu+1}-t_G=J_\varphi(t_G) (t_\nu-t_G)+O \bigl(\|t_\nu-t_G \| ^2 \bigr), $$

so that in order to study the local convergence of this sequence it is sufficient to estimate the spectral radius ρ or any induced norm of J _φ(t _G) and determine θ in such a way that ρ(J _φ(t _G))<1. Notice that the Jacobian of φ(t) at t=t _G is given by I−θK where K=J _S(t _G) is the Jacobian of S(t) at t=t _G. Therefore, if we can find a preconditioner V(t) such that K has real positive eigenvalues with minimum and maximum eigenvalues κ _min and κ _max respectively, then the choice θ=2/(κ _min+κ _max) insures local convergence and provides the minimum spectral radius of J _φ(t _G) given by

$$\rho\bigl(J_\varphi(t_G)\bigr)=\frac{\kappa_{\max}-\kappa_{\min}}{\kappa _{\max}+\kappa_{\min}}= \frac{\mu-1}{\mu+1}<1,\quad\mu=\kappa_{\max}/\kappa_{\min}. $$

Moreover, any values $\hat{\kappa}_{\min}\le\hat{\kappa}_{\max}$ such that $\hat{\kappa}_{\min}\le \kappa_{\min}\le\kappa_{\max}\le\hat{\kappa}_{\max}$ can be used instead of κ _min and κ _max to determine a value $\hat{\theta}=2/(\hat{\kappa}_{\min}+\hat{\kappa}_{\max})$ which insures convergence. Also notice that the closer μ is to 1 the faster is the convergence of the iteration.

Therefore our goal is to perform a spectral analysis of K and to find an upper bound to the ratio μ=κ _max/κ _min, assuming that all the eigenvalues of K are real positive. From the composition rule of derivatives one finds that

$$K=V\bigl(\sigma(t_G)\bigr)^{-1} U^TJ_\varGamma(G) U+J_{V(\sigma (t_G))^{-1}}\bigl(\sigma(t_G)\bigr) U^T \operatorname{vec}\bigl(\varGamma\bigl(\sigma(t_G)\bigr)\bigr) $$

and since $U^{T}\operatorname{vec}(\varGamma(\sigma(t_{G})))=0$, it follows that

$$ K=V\bigl(\sigma(t_G)\bigr)^{-1}U^TJ_\varGamma(G) U. $$

(4.2)

To evaluate J _Γ(G), we recall that $\varGamma(X)=\sum_{i=1}^{m} X^{-1}\log(XA_{i}^{-1})$, so that it is sufficient to determine the formal expression of J _ψ(G) for ψ(G,A)=G ⁻¹log(GA ⁻¹) for a generic A and then to write $J_{\varGamma}(G)=\sum_{i=1}^{m}J_{\psi(G,A_{i})}(G)$. In order to evaluate J _ψ(G), we rely on the definition of Fréchet derivative of a matrix function f(X) at X in the direction E

$$Df_X[E]=\lim_{t\to0}\frac{f(X+tE)-f(X)}{t}=\frac{d}{dt}\bigg\vert_{t=0}f(X+tE). $$

In fact, the n ²×n ² Jacobian matrix J _f(X) of the vector function $\operatorname{vec}\circ f \circ \operatorname{vec}^{-1}$ at $\operatorname{vec}(X)$ is related to the Fréchet derivative by the equation

$$ \operatorname{vec}\bigl(Df_X[E] \bigr)=J_f(X)\operatorname{vec}(E). $$

(4.3)

We recall also the following properties of the Fréchet derivative [22] where f,g are given matrix functions and φ(X)=X ⁻¹:

$$ \begin{aligned} &D(fg)_X[E]=Df_X[E]g(X)+f(X)Dg_X[E], \quad \hbox{product rule,} \\ &D(f\circ g)_X[E]=Df_{g(X)}\bigl[Dg_X[E] \bigr],\quad \hbox{chain rule,} \\ &D\varphi_{X}[E]=-X^{-1}EX^{-1},\quad \hbox{inversion.} \end{aligned} $$

(4.4)

For the derivative of the exponential function we have (see [22, Eq. 10.17a])

$$J_{\exp}(Y)=(I\otimes\exp Y)\beta \bigl(Y^T\otimes I-I \otimes Y \bigr),\quad\beta(z)=\bigl(e^z-1\bigr)/z. $$

Therefore, since J _log(X)=J _exp(Y)⁻¹ for Y=logX, we find that

$$ J_{\log}(X)=\gamma \bigl(\log \bigl(X^T \bigr) \otimes I-I\otimes \log X \bigr) \bigl(I\otimes X^{-1} \bigr),\quad \gamma(z)=z/\bigl(e^z-1\bigr). $$

(4.5)

We are now ready to provide an explicit expression of the Fréchet derivative of the function ψ(X,A)=X ⁻¹log(XA ⁻¹) and of the Jacobian J _ψ(X,A)(X).

Lemma 4.1

Let ψ(X)=X ⁻¹log(XA ⁻¹). Assume that A,X are positive. For the matrix J _ψ(X) such that $\operatorname{vec} (D\psi_{X}[E] )=J_{\psi}(X)\operatorname {vec}(E)$ we have

$$\begin{aligned} &J_{\psi}(X)=-X^{-1}\log \bigl(XA^{-1} \bigr)\otimes X^{-1} +\bigl(A^{-1}\otimes X^{-1}\bigr)\gamma(W) \bigl(I\otimes AX^{-1}\bigr), \\ &W=\log \bigl(XA^{-1} \bigr)\otimes I-I\otimes\log \bigl(XA^{-1} \bigr), \end{aligned}$$

with γ(z)=z/(e ^z−1).

Proof

Since h(X):=log(XA ⁻¹) is the composition of f(X)=log(X) and g(X)=XA ⁻¹, we get by (4.4)

$$Dh_X[E]=D{\log}_{XA^{-1}}\bigl[EA^{-1}\bigr]. $$

As ψ(X) is the product of f(X)=X ⁻¹ and h(X), (4.4) gives us

$$D{\psi}_X[E]=-X^{-1}EX^{-1}\log \bigl(XA^{-1} \bigr)+X^{-1}Dh_X[E]. $$

Combining the latter two equations yields

$$D{\psi}_X[E]=-X^{-1}EX^{-1}\log \bigl(XA^{-1} \bigr)+X^{-1}D{\log }_{XA^{-1}} \bigl[EA^{-1}\bigr]. $$

By using (4.3) and (1.6) we find that the matrix J _ψ(X) representing Dψ _X is given by

$$J_{\psi}(X)=- \bigl(X^{-1}\log \bigl(XA^{-1} \bigr) \bigr)^T\otimes X^{-1}+ \bigl(I\otimes X^{-1} \bigr)J_{\log} \bigl(XA^{-1} \bigr) \bigl(A^{-T} \otimes I \bigr). $$

Replacing (4.5) in the equation above and using the fact that A=A ^T,X=X ^T yields

$$\begin{aligned} J_{\psi}(X) =&-\log \bigl(A^{-1}X \bigr)X^{-1} \otimes X^{-1} \\ &{}+\bigl(I\otimes X^{-1}\bigr)\gamma \bigl(\log \bigl(A^{-1}X \bigr)\otimes I-I\otimes\log \bigl(XA^{-1} \bigr) \bigr) \bigl(A^{-1}\otimes AX^{-1} \bigr). \end{aligned}$$

Using the fact that Wlog(V)W ⁻¹=log(WVW ⁻¹), the first term can be written as −X ⁻¹log(XA ⁻¹)⊗X ⁻¹. The second term can be written as (I⊗X ⁻¹)(A ⁻¹⊗I)γ(log(XA ⁻¹)⊗I−I⊗log(XA ⁻¹))(I⊗AX ⁻¹), which completes the proof. □

Recall that $\varGamma(X)=\sum_{i=1}^{m}\psi(X,A_{i})$ and $G^{-1}\sum_{i=1}^{m}\log(GA_{i}^{-1})=0$, for G=σ(t _G). Then by Lemma 4.1, we obtain the following formula for the Jacobian J _Γ(σ(t)):

$$\begin{aligned} &J_\varGamma(G)= \bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr),\qquad H=\sum_{i=1}^mH_i, \\ &H_i=\bigl(A_i^{-1}\otimes I\bigr)\gamma \bigl(\log\bigl(GA_i^{-1}\bigr)\otimes I-I\otimes\log \bigl(GA_i^{-1}\bigr) \bigr) (I\otimes A_i). \end{aligned}$$

Moreover, by using the properties of the Kronecker product and the following relation log(GA ⁻¹)=A ^1/2log(A ^−1/2 GA ^−1/2)A ^−1/2, we can write

$$\begin{aligned} & H_i= \bigl(A_i^{-1/2}\otimes A_i^{1/2}\bigr) \gamma(\log M_i\otimes I-I \otimes\log M_i) \bigl(A_i^{-1/2}\otimes A_i^{1/2}\bigr), \\ &M_i=A_i^{-1/2}GA_i^{-1/2}. \end{aligned}$$

From this expression it turns out that H _i is positive, and from (4.2) we find that J _S(t _G) is the product of the matrices V(σ(t _G))⁻¹ and $U^{T}(I\otimes G^{-1})\sum_{i=1}^{m} H_{i} (I\otimes G^{-1})U$, which is a positive matrix.

Thus we may conclude with the following

Theorem 4.1

The Jacobian K of the function S(t) in (4.2) at σ(t _G)=G is given by

$$\begin{aligned} &K=V^{-1}U^T \bigl(I\otimes G^{-1} \bigr) H \bigl(I\otimes G^{-1} \bigr)U, \\ &H=\sum_{i=1}^m H_i,\quad H_i= \bigl(A_i^{-1/2}\otimes A_i^{1/2} \bigr) \gamma (\log M_i\otimes I-I \otimes\log M_i ) \bigl(A_i^{-1/2}\otimes A_i^{1/2} \bigr), \\ &M_i=A_i^{-1/2}GA_i^{-1/2}, \\ &\gamma(z)=z/\bigl(e^z-1\bigr). \end{aligned}$$

Moreover, the eigenvalues of K are the solutions of the equation

$$\det \bigl(\kappa V - U^T\bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr) U \bigr)=0. $$

4.2 An elementary preconditioner

The simplest choice for the preconditioner V(t) in (4.1) is V(t)=U ^T U=D. This corresponds to projecting the gradient of the function f(X,A ₁,…,A _p) on the set $\mathcal{U}$ according to the Euclidean scalar product. The problem det(κI−K)=0 turns into the generalized q-dimensional symmetric eigenvalue problem

$$\det \bigl(U^T \bigl(\kappa I-\bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr) \bigr) U \bigr)=0. $$

This problem is the projection on the space spanned by the columns of U of the problem det(νI−(I⊗G ⁻¹)H(I⊗G ⁻¹))=0 which has real positive solutions.

Now we recall the following result, valid for general positive matrices A,B, which relates the generalized eigenvalues of the pair (A,B) to the ones of the projected pair (U ^T AU,U ^T BU).

Lemma 4.2

Let A,B be positive n×n matrices and U an n×m matrix. Then the generalized eigenvalues of the pair (U ^T AU,U ^T BU), which solve the equation det(U ^T(A−κB)U)=0, are real positive and lie in between the maximum and minimum eigenvalues λ of the pair (A,B), such that det(A−λB)=0. Moreover, the extreme eigenvalues λ _min,λ _max of the pair (A,B) are such that α _min/β _max≤λ _min≤λ _max≤α _max/β _min, where α _min,α _max,β _min,β _max are the minimum and maximum eigenvalues of the matrices A and B, respectively.

Proof

The condition det(λB−A)=0 is equivalent to det(λI−B ^−1/2 AB ^−1/2)=0, which has real positive solutions since B ^−1/2 AB ^−1/2 is positive. The remaining part of the lemma follows from the fact that maximum and minimum eigenvalues of the larger and smaller problems coincide with maximum and minimum value of the Rayleigh quotient x ^T Ax/x ^T Bx for $x\in\mathbb{R}^{n}\setminus \{0\}$, and for $x\in\operatorname{span}(U)$, respectively. □

A first consequence of the above lemma is that the extreme eigenvalues κ _min and κ _max of K are in between the maximum and the minimum eigenvalue of the n ²-dimensional symmetric matrix Y=(I⊗G ⁻¹)H(I⊗G ⁻¹), so that the ratio μ between the maximum and minimum eigenvalue of K is less than or equal to the condition number μ(Y) of the symmetric matrix Y. Moreover since $Y=\sum_{i=1}^{m} Y_{i}$ with

$$\begin{aligned} Y_i =&\bigl(A_i^{-1/2}\otimes A_i^{-1/2}\bigr) \bigl(I\otimes M_i^{-1} \bigr) \gamma(\log M_i\otimes I-I\otimes\log M_i) \\ &\times\bigl(I\otimes M_i^{-1}\bigr) \bigl(A_i^{-1/2}\otimes A_i^{-1/2} \bigr), \end{aligned}$$

one finds that $\widehat{k}_{\min}:=\sum_{i=1}^{m}\lambda_{\min}^{(i)}\le\kappa_{\min}$ and $\widehat{k}_{\max}:=\sum_{i=1}^{m}\lambda_{\max}^{(i)}\ge\kappa_{\max}$, where $\lambda_{\min}^{(i)}$ and $\lambda_{\max}^{(i)}$ are the minimum and the maximum eigenvalues of Y _i. Moreover, from Lemma 4.2 and from the expression above for Y _i it follows that $\lambda^{(i)}_{\min}\ge\gamma^{(i)}_{\min}/(\alpha_{\max}^{(i)})^{2}$, $\lambda^{(i)}_{\max}\le\gamma^{(i)}_{\max}/(\alpha_{\min}^{(i)})^{2}$, where $\alpha_{\min}^{(i)}$, $\alpha^{(i)}_{\max}$ are the minimum and the maximum eigenvalues of A _i, respectively, while $\gamma^{(i)}_{\min}$ and $\gamma^{(i)}_{\max}$ are the minimum and maximum eigenvalues of $(I\otimes M_{i}^{-1}) \gamma(\log M_{i}\otimes I-I\otimes\log M_{i}) (I\otimes M_{i}^{-1})$, respectively.

From the properties of the matrix function γ(⋅) and from the properties of the Kronecker product one finds that the eigenvalues of the latter matrix can be explicitly given in terms of the eigenvalues $\nu^{(i)}_{r}$ of the matrix M _i. In fact, they coincide with $\frac{1}{({\nu^{(i)}_{s}})^{2}}(\log t^{(i)}_{r,s})/(t^{(i)}_{r,s}-1)$ where $t^{(i)}_{r,s}=\frac{\nu^{(i)}_{r}}{\nu^{(i)}_{s}}$.

Since the function (logt)/(t−1) is monotonically decreasing, its minimum and maximum are

$$\begin{aligned} \eta^{(i)}_{\min}&=\bigl(\log\mu^{(i)}\bigr)/\bigl( \mu^{(i)}-1\bigr), \\ \eta^{(i)}_{\max}&=\log\bigl(1/\mu^{(i)}\bigr)/ \bigl(1/\mu^{(i)}-1\bigr)=\mu ^{(i)}\bigl(\log\mu^{(i)} \bigr)/\bigl(\mu^{(i)}-1\bigr), \end{aligned}$$

for μ ⁽ⁱ⁾=μ(M _i) the spectral condition number of M _i. Additionally, taking the factor (ν ⁽ⁱ⁾)⁻² into consideration gives

$$\begin{aligned} \gamma^{(i)}_{\min}&\geq\eta^{(i)}_{\min} \bigl(\nu_{\max }^{(i)}\bigr)^{-2}, \\ \gamma^{(i)}_{\max}&\leq\eta^{(i)}_{\max} \bigl(\nu^{(i)}_{\min }\bigr)^{-2} \leq \mu^{(i)} \bigl(\nu^{(i)}_{\min}\bigr)^{-2}, \end{aligned}$$

where $\nu_{\min}^{(i)}$ and $\nu_{\max}^{(i)}$ represent respectively the minimum and maximimum eigenvalue of M _i.

Therefore, we may conclude that the eigenvalues of K are bounded by $\widetilde{\kappa}_{\min}:=\sum_{i=1}^{m} \eta^{(i)}_{\min}/(\nu ^{(i)}_{\max}\alpha^{(i)}_{\max})^{2}$ and $\widetilde{\kappa}_{\max}:=\sum_{i=1}^{m} \eta^{(i)}_{\max}/(\nu ^{(i)}_{\min}\alpha^{(i)}_{\min})^{2}$.

Observe that this bound gets worse when either some matrix is ill-conditioned or if some matrix $A_{i}^{-1/2}GA_{i}^{-1/2}$ is ill-conditioned. The latter case cannot occur if the matrices A _i do not differ much from G. The dependence of this bound on the conditioning of A _i makes this algorithm very inefficient as long as some A _i is ill-conditioned. This drawback is overcome in the next section, where we design a more effective preconditioner.

4.3 A preconditioner based on a differential geometric viewpoint

The Karcher mean for positive matrices inherits a beautiful interpretation in terms of differential geometry. It can be considered as the center of mass for a well chosen inner product on the manifold of positive matrices. In this section and in Sect. 5 we consider two approaches inspired by this idea. For more information we refer to the overview in [25], and the articles [15, 20, 36–38].

When considering a manifold optimization approach, the intersection $\mathcal{U}$ of a linear space $\mathcal{T}$ with the manifold of positive matrices $\mathcal{P}_{n}$ can be viewed as a Riemannian submanifold of $\mathcal{P}_{n}$ itself, which in turn is called the enveloping space. This entails that the inner product from this enveloping space is induced on the submanifold. An immediate consequence is that the gradient of the cost function for the submanifold is given by the orthogonal projection (with respect to the inner product) of the gradient for the enveloping space. Similar to the space of symmetric matrices, being the tangent space to the manifold of positive matrices, the intersection $\mathcal{V}$ of the linear space $\mathcal{T}$ with the space of symmetric matrices is the tangent space to $\mathcal{U}$.

First consider the manifold of positive matrices endowed with the Euclidean inner product $g_{X} ( A,B ) = \operatorname{tr}(A B)$, with A and B symmetric, and X a positive matrix. Note that even though this inner product g _X is independent of X, the subscript notation is kept for consistency. In this case, the orthogonal projection of a symmetric matrix A onto $\mathcal{T}$ gives a matrix T, with

$$ \operatorname{vec}(T)= U \bigl(U^T U \bigr)^{-1} U^T \operatorname{vec}(A), $$

or $\operatorname{vec}(T)=Ut$, with

$$ t= \bigl(U^T U \bigr)^{-1} U^T \operatorname{vec}(A). $$

(4.6)

The expression for the gradient of the Karcher cost function, corresponding to the Euclidean inner product, is known for the manifold of positive matrices and is given by

$$\begin{aligned} \operatorname{grad}_{e} f (X;A_1, \ldots, A_m) =& 2 X^{-1} \sum_{i=1}^m \log \bigl(X A_i^{-1} \bigr) \\ =& 2 X^{-1/2} \sum_{i=1}^m \log \bigl(X^{1/2} A_i^{-1} X^{1/2} \bigr) X^{-1/2}. \end{aligned}$$

(4.7)

The gradient naturally defines the direction of steepest ascent. Nevertheless, the gradient lies in the tangent space, and to build an algorithm from this, a practical way is to follow the gradient and then go back to the manifold through a suitable function, called retraction. The precise definition of a retraction, together with general theoretical assumptions it should satisfy, can be found in [1]. Figure 2(a) graphically illustrates the concept of a retraction, where a vector ξ _X in the tangent space $T_{X}\mathcal{P}_{n}$ of the positive matrices is retracted to a point R _X(ξ _X) residing on the manifold $\mathcal{P}_{n}$. On a manifold, the classical steepest descent algorithm is graphically depicted in Fig. 2(b). The thin red lines depict the contour lines, the blue arrows the gradients, and the green curves the retractions to the manifold.

Observe that for $\mathcal{P}_{n}$ immersed in the set of symmetric matrices, the tangent space at a point is the whole set of symmetric matrices. So one can consider the basic retraction R _X(A)=X+A for a sufficiently small symmetric matrix A.

Entering now the gradient (4.7) in projection (4.6) and applying a gradient descent method with the basic retraction R _X(A)=X+A, we arrive exactly at the Richardson-like algorithm for finding the fixed points of function (4.1).

However, since the function f to be minimized is defined through the distance (1.2), it is more natural to consider the manifold of positive matrices endowed with the inner product $g_{X} ( A,B ) = \operatorname{tr} (A X^{-1} B X^{-1} )$, with A, B and X as before. In this case, the gradient for the enveloping space is known to be

$$\begin{aligned} \operatorname{grad}_{n} f (X;A_1, \ldots, A_m) =& 2 X \sum_{i=1}^m \log \bigl( A_i^{-1} X \bigr) \\ =& 2 X^{1/2} \sum_{i=1}^m \log \bigl(X^{1/2} A_i^{-1} X^{1/2} \bigr) X^{1/2}. \end{aligned}$$

Note the difference with (4.7).

The orthogonal projection T onto the intersection $\mathcal{V}$ (of $\mathcal{T}$ and the space of symmetric matrices) of this gradient, with respect to the Riemannian scalar product, can be found as the solution of the equations

$$\begin{aligned} &\operatorname{grad}_{n} f (X) = T + S , \\ &g_X ( S,K ) = \operatorname{tr} \bigl(S X^{-1} K X^{-1} \bigr) = 0, \quad \mbox{for every } K \in\mathcal{V}. \end{aligned}$$

Writing again $\operatorname{vec}(T)=Ut$, we find in parameter space

$$ t= \bigl( U^T \bigl( X^{-1} \otimes X^{-1} \bigr) U \bigr)^{-1} U^T \bigl( X^{-1} \otimes X^{-1} \bigr) \operatorname {vec}\bigl( \operatorname{grad}_{n} f (X)\bigr). $$

(4.8)

The factor U ^T(X ⁻¹⊗X ⁻¹)U is recurring and is abbreviated as D _X, where the subscript points to the intrinsic variable X. Observe that this Riemannian orthogonal projection can be seen as a Euclidean oblique projection where the two bases of the subspace are the columns of U and (X ⁻¹⊗X ⁻¹)U, respectively.

Using this expression, it is possible to define another gradient descent method where we are now searching the fixed points of the function

$$ \varphi(t)=t - \theta D_{\sigma(t)}^{-1} U^T \bigl( \sigma(t)^{-1} \otimes\sigma(t)^{-1} \bigr) \operatorname{vec} \Biggl(\sigma(t) \sum_{i=1}^m \log \bigl( A_i^{-1} \sigma(t) \bigr) \Biggr). $$

(4.9)

Relying on (1.6) to incorporate the Kronecker product into the vectorization, we find that $(\sigma^{-1}\otimes\sigma^{-1})\operatorname{vec}(\sigma\sum_{i=1}^{m}\log(A_{i}^{-1}\sigma))= \operatorname{vec}(\sum_{i=1}^{m}\log(A_{i}^{-1}\sigma)\sigma^{-1})$. Applying a property of the matrix logarithm we may rewrite the latter expression as $\operatorname{vec}(\sigma^{-1}\sum_{i=1}^{m}\log(\sigma A_{i}^{-1}))$. This way, (4.9) takes the form of (4.1) with

$$V( \sigma(t) ) = U^T ( \sigma(t)^{-1} \otimes \sigma(t)^{-1})U $$

To analyze the convergence of (4.1) with this choice for V(σ(t)), we have to analyze the eigenvalues of the Jacobian K=J _S(t _G) of S(t) in (4.1) where the equation det(κI−K)=0 takes the form of the following generalized eigenvalue problem

$$ \det \bigl(U^T \bigl(\kappa\bigl(G^{-1}\otimes G^{-1}\bigr)- \bigl(I\otimes G^{-1}\bigr) H\bigl(I\otimes G^{-1}\bigr) \bigr)U \bigr)=0. $$

(4.10)

Since the two matrices in (4.10) are positive, in view of Lemma 4.2, the solutions of this generalized eigenvalue problem are real positive and are located in between the maximum and the minimum solution of the larger problem

$$\det \bigl(\lambda\bigl(G^{-1}\otimes G^{-1}\bigr)- \bigl(I \otimes G^{-1}\bigr) H\bigl(I\otimes G^{-1}\bigr) \bigr)=0, $$

which in turn can be rewritten as a standard eigenvalue problem

$$\det \bigl(\lambda I -\bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigl(I\otimes G^{-1}\bigr) H \bigl(I\otimes G^{-1}\bigr) \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigr)=0. $$

Since $H=\sum_{i=1}^{m} H_{i}$, and the matrices H _i are real symmetric, the eigenvalues of this problem are located in between the sum of the minimum and the sum of the maximum eigenvalues of each subproblem

$$ \det \bigl(\lambda I - \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigl(I\otimes G^{-1}\bigr) H_i \bigl(I\otimes G^{-1}\bigr) \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigr)=0, $$

(4.11)

that is det(λ(G ⁻¹⊗G)−H _i)=0, or equivalently det(λI−(G⊗I)×H _i(I⊗G ⁻¹))=0. The matrix in the latter expression is similar to $(A_{i}^{-1/2}\otimes A_{i}^{-1/2}) (G\otimes I)H_{i}(I\otimes G^{-1}) (A_{i}^{1/2}\otimes A_{i}^{1/2})$, which, using the expression of H _i provided in Theorem 4.1, can be written as

$$(M_i\otimes I) \gamma( \log M_i\otimes I-I\otimes\log M_i) \bigl(I\otimes M_i^{-1}\bigr). $$

This way, the eigenvalues of (4.11) can be explicitly given in terms of the eigenvalues $\nu^{(i)}_{r}$ of the matrix M _i. In fact, they coincide with the $t^{(i)}_{r,s}(\log t^{(i)}_{r,s})/(t^{(i)}_{r,s}-1)$ where $t^{(i)}_{r,s}=\frac{\nu^{(i)}_{r}}{\nu^{(i)}_{s}}$.

Since the function t(logt)/(t−1) is monotone, for the minimum and maximum solution to (4.11) we have

$$\begin{aligned} \eta^{(i)}_{\min}&= \bigl(1/\mu^{(i)}\bigr) \log \bigl(1/\mu^{(i)}\bigr)/\bigl(1/\mu ^{(i)}-1\bigr)=\bigl(\log \mu^{(i)}\bigr)/\bigl(\mu^{(i)}-1\bigr), \\ \eta^{(i)}_{\max}&=\mu^{(i)}\bigl(\log \mu^{(i)}\bigr)/\bigl(\mu^{(i)}-1\bigr), \end{aligned}$$

respectively, for μ ⁽ⁱ⁾=μ(M _i) the spectral condition number of M _i. Therefore, we may conclude that the eigenvalues of K are in between $\sum_{i=1}^{m} \eta^{(i)}_{\min}$ and $\sum_{i=1}^{m} \eta ^{(i)}_{\max}$. This way, we find for the optimal value of θ and for the optimal spectral radius the estimates

$$\begin{aligned} &\theta=\frac{2}{\sum_{i=1}^m \frac{\mu^{(i)}+1}{\mu^{(i)}-1}\log \mu^{(i)}}, \\ & \rho=\frac{\sum_{i=1}^m \log\mu^{(i)}}{\sum_{i=1}^m \frac{\mu ^{(i)}+1}{\mu^{(i)}-1}\log\mu^{(i)}}. \end{aligned} $$

It is interesting to point out that in this case the convergence speed is related neither to the condition number of the geometric mean G nor to those of the matrices A _i but is related only to the relative distances of G from each A _i measured by the quantities μ ⁽ⁱ⁾=μ(M _i), $M_{i}=A_{i}^{-1/2}GA_{i}^{-1/2}$. The closer they are to 1 the faster is the convergence. Therefore, if the matrices to average are not too far from each other, so that the quantities μ(M _i) are close to 1, then the optimal value of θ is close to 1/m and a very fast convergence is expected. This analysis is confirmed by the numerical experiments.

4.4 The case of Toeplitz matrices

From the computational point of view, at each step of the iteration (4.1) one has to compute $U^{T}\operatorname{vec}(\varGamma (\sigma(t)))$ and then solve a linear system with the matrix V(σ(t)). The former computation, based on (3.6), requires O(mn ³) arithmetic operations (ops), while the cost of the latter depends on the structure of V(σ(t)).

In this section we examine the case where $\mathcal{U}$ is the class of symmetric Toeplitz matrices and where σ(t) associates t with the Toeplitz matrix having as first column t. We describe a way to make the algorithm of Sect. 4.3 more efficient.

Indeed, for the iteration analyzed in Sect. 4.2, V is the diagonal matrix with diagonal entries (n,2n−2,…,2) and the cost of solving a system with matrix V amounts to n divisions.

The iteration examined in Sect. 4.3 has a higher convergence speed but at each step an n×n system with V=U ^T(X ⁻¹⊗X ⁻¹)U must be solved, where X is a symmetric positive definite Toeplitz matrix.

We split the computation in two steps. In the first, the n ² entries of V are computed, in the second step a standard O(n ³) ops linear system solver is used. Concerning the first step we discuss two approaches.

In both approaches the inverse of the Toeplitz matrix X needs to be computed, which can be done efficiently using the Gohberg Semencul formula [13]. Here, vectors v ₁, v ₂, v ₃, v ₄ are determined such that X ⁻¹=L(v ₁)L(v ₂)^T−L(v ₃)L(v ₄)^T, where L(v) is the lower triangular Toeplitz matrix whose first column is v. From these, the n ² entries of X ⁻¹ can be found. The overall cost is O(n ²) ops.

1.
As a first attempt, the entries of V are computed in a straightforward manner using the entries of X ⁻¹:
$$V=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} \gamma_{1,1} & 2\gamma_{1,2} & \cdots& 2\gamma_{1,n} \\ 2\gamma_{1,2} & 2\gamma_{2,2} & \cdots& 2\gamma_{2,n} \\ \vdots & \vdots& \ddots& \vdots\\ 2\gamma_{1,n} & 2\gamma_{2,n} & \cdots& 2\gamma_{n,n} \end{array} \right ], $$
where
$$\begin{aligned} & \gamma_{1,j}=\sum_{i=1}^n \sum _{k=1}^{n-j+1} \bigl(X^{-1} \bigr)_{i,k} \bigl(X^{-1}\bigr)_{i,k+j-1}, \\ & \gamma_{j,p}= \sum_{i=1}^{n-j+1} \sum_{k=1}^{n-p+1} \bigl( \bigl(X^{-1} \bigr)_{i,k} \bigl(X^{-1}\bigr)_{i+j-1,k+p-1} + \bigl(X^{-1}\bigr)_{i,k+p-1} \bigl(X^{-1} \bigr)_{i+j-1,k} \bigr). \end{aligned}$$
The cost of this approach in terms of arithmetic operations is of the order O(n ⁴).
2.
In the second approach, we show that the cost of this computation can be kept at the level of O(n ³logn) ops by combining the Gohberg Semencul formula and the FFT. For a given i, the product vector w _i=(X ⁻¹⊗X ⁻¹)Ue _i, where e _i is the ith vector of the canonical basis, is such that $w_{i}=\operatorname{vec}(X^{-1}E_{i}X^{-1})$, with E _i being the symmetric Toeplitz matrix whose first column is e _i. Therefore, compute first the columns of E _i X ⁻¹ by performing O(n ²) additions, and then multiply X ⁻¹ by these columns, stacking the results to obtain w _i. This computation is performed in O(n ²logn) operations for each i by using the Goghberg Semencul formula, since the multiplication of a lower triangular Toeplitz matrix and a vector can be performed in O(nlogn) operations by means of the FFT [13]. Therefore the overall computation of this stage for i=1,…,n is O(n ³logn) ops. Finally, compute for any i the vector U ^T w _i for the cost of O(n ²) additions.

The performance of these methods will be compared in Sect. 6.

5 Kähler metric mean for Toeplitz matrices

The Karcher mean of positive definite matrices has the specific interpretation of being the barycenter of the given matrices for the natural metric (1.2) on this manifold. Hence there are in a certain sense two possible generalizations. On the one hand, try to generalize the geometric mean concept, or, on the other hand, try to generalize the barycenter concept. Previously we focused on an extension of the geometric mean. Hereafter we focus on the positive definite Toeplitz matrix manifold itself, denoted by $\mathcal{T}_{n}$, and consider a barycenter in this case. This mean cannot be called a geometric mean in the sense of satisfying all required properties, but through its intuitive definition, many desirable properties could arise.

The concept of a barycenter is not restricted to the specific metric used to define the Karcher mean. For example, when the set $\mathcal{T}_{n}$ is endowed with the classical Euclidean inner product, the resulting barycenter is nothing else than the arithmetic mean. Using a probabilty argument, in [3, 4] a metric on $\mathcal{T}_{n}$ is introduced, called the Kähler metric. This metric results in a complete, simply connected manifold with non-positive sectional curvature everywhere, or a Cartan–Hadamard manifold. Thus, by [17, 27], existence and uniqueness are guaranteed for the barycenter with respect to this metric.

We will recall some known facts about the Kähler metric, and then we will give an explicit formula for the barycenter in the real case and a numerical procedure to compute the barycenter in the complex case.

To construct the Kähler metric, a Toeplitz matrix is first transformed to an n-tuple (P ₀,μ ₁,…,μ _n−1) in $\mathbb{R}_{+}^{*}\times\mathbb{D}^{n-1}$, with $\mathbb{R}_{+}^{*}$ the set of strictly positive real numbers and $\mathbb{D}$ the set of complex numbers of modulus less than one. This transformation, denoted as ζ(T)=[P _T,μ _T,1,…,μ _T,n−1]^T, is performed as follows:

$$P_T = t_0,\qquad \mu_{T,j} = (-1)^j \frac{\det(S_j)}{\det(R_j)}, $$

with t ₀ the main diagonal element of T, R _j the principal submatrix of size j of T (the upper left j×j submatrix) and S _j obtained by shifting R _j down one row, or equivalently, by removing the first row and last column of R _j+1 (the inverse transformation can be found in [42]). In what follows, we use this one-to-one relation between the Toeplitz matrices and the corresponding n-tuple, and when clear by the context, we will neglect the distinction and identify one with the other.

For X and Y being the transformations of two positive Toeplitz matrices X=[P _X,μ _X,1,…,μ _X,n−1]^T and Y=[P _Y,μ _Y,1,…,μ _Y,n−1]^T, the metric is given by

$$\begin{aligned} &\operatorname{d}(X,Y) = \Biggl( n \sigma(P_X,P_Y)^2 + \sum_{j=1}^{n-1} (n-j)\tau( \mu_{X,j},\mu_{Y,j})^2 \Biggr)^{1/2}, \\ &\sigma(P_X,P_Y) = \biggl| \log \biggl(\frac{P_Y}{P_X} \biggr) \biggr|, \qquad \tau(\mu_{X,j},\mu_{Y,j}) = \mathrm{atanh} \biggl( \biggl|\frac{\mu _{Y,j}-\mu_{X,j}}{1-\mu_{X,j}\mu_{Y,j}} \biggr| \biggr), \end{aligned}$$

where $\operatorname{atanh}(z)=\frac{1}{2}\log (\frac {1+z}{1-z} )$.

The barycenter of the positive Toeplitz matrices T _i, for i=1,…,m, with respect to the Kähler metric will be denoted by B(T ₁,…,T _m)=[P _B,μ _B,1,…,μ _B,n−1]^T. It is obtained in this transformed space by minimizing the function

$$\begin{aligned} f(X) &= \sum_{i=1}^m \operatorname{d}^2(X,T_i) \end{aligned}$$

over $\mathbb{R}_{+}^{*} \times\mathbb{D}^{n-1}$. Notice that the problem of minimizing f(X) can be decoupled into the problems of minimizing $\varphi_{0}(x)=\sum_{i=1}^{m} \sigma(x,P_{T_{i}})^{2}$ over $\mathbb{R}_{+}^{*}$, and the n−1 scalar functions

$$\varphi_j(z)=\sum_{i=1}^m \tau(z,\mu_{T_i,j})^2,\quad j=1,\ldots,n-1 $$

over $\mathbb{D}$. The minimum of φ ₀(x) is easily obtained as $P_{B}=(P_{T_{1}}\cdots P_{T_{m}})^{1/m}$ by solving the equation ∇φ ₀(x)=0. The minimum of φ _j(z) is nothing else than the barycenter of $\mu _{T_{1},j},\ldots,\mu_{T_{m},j}$ with respect to the customary Poincaré metric on the unit disk and is the point where the gradient

$$ \nabla\varphi_j(z)= 2\bigl(|z|^2 - 1 \bigr) \sum_{i=1}^m\operatorname {sign}(c_{i,j})\operatorname{atanh}\bigl(|c_{i,j}|\bigr),\quad c_{i,j}=\frac{\mu_{{T_i},j}-z}{1-\overline{z}\mu_{{T_i},j}}, $$

(5.1)

equals zero.

In the real case we are able to find an explicit expression for this barycenter as well, since $\operatorname{sign}(c)\operatorname {atanh}(|c|)=\operatorname{atanh}(c)$ and after some manipulations we get

$$\mu_{X,j}=\mathcal{C} \bigl( \bigl(\mathcal{C}(\mu_{T_1,j}) \cdots \mathcal{C}(\mu_{T_m,j}) \bigr)^{1/m} \bigr), $$

where $\mathcal{C}(z)=(1-z)/(1+z)$ is the Cayley transform.

In the complex case we were not able to find such an explicit formula but a quick numerical method can be devised using a gradient descent algorithm. We recall that the tangent space to the Poincaré disk can be identified with the complex plane and thus for a sufficiently small tangent vector $v\in\mathbb{C}$, one can consider the retraction R _z(v)=z+v, which captures the fact that the manifold is an open subset of the complex plane. The resulting algorithm to find the barycenter of $\mu_{1},\ldots,\mu_{n}\in\mathbb{C}$ is given by the iteration

$$ \begin{aligned} &z_{k+1}=z_k+t_k v_k, \\ &v_k=\bigl(1-|z_k|^2\bigr)\sum _{i=1}^n \operatorname{sign}(c_{i,k}) \operatorname{atanh}(|c_{i,k}|), \\ &c_{i,k}=\frac{\mu_i-z_k}{1-\overline{z}_k\mu_i}, \end{aligned} $$

(5.2)

for a suitable initial value z ₀ and a sufficiently small steplength t _k.

Another possibility is to consider the retraction

$$R_z(v)=\frac{z+e^{i \theta}+(z-e^{i \theta})e^{-s}}{ 1+\overline{z}e^{i \theta}+(1-\overline{z}e^{i \theta})e^{-s}},\quad \theta= \arg v, \ s= \frac{2|v|}{1-|z|^2}, $$

which corresponds to moving along the geodesics of the Poincaré disk. The corresponding gradient descent method is

$$z_{k+1}=R_{z_k}(t_k v_k), $$

with the same v _k as (5.2).

5.1 Properties of the Kähler barycenter

Regarding the properties of this barycenter, it is easily seen that it is permutation invariant, repetition invariant and idempotent (this holds for any barycenter). Moreover, for any α>0, the transformed values of αT are [αP _T,μ _T,1,…,μ _T,m]^T and from the explicit expression of P _B in the real case we get B(αT ₁,T ₂,…,T _m)=α ^1/m B(T ₁,…,T _m), that is, homogeneity holds.

Unfortunately, this new barycenter does not possess other properties as shown by the following example.

Example 5.1

From the explicit expression for the mean in the real case we get a simple formula for the Kähler barycenter of two 2×2 real matrices

$$T_1=\left [ \begin{array}{c@{\quad}c} x_1 & y_1 \\ y_1 & x_1 \end{array} \right ],\qquad T_2=\left [ \begin{array}{c@{\quad}c} x_2 & y_2 \\ y_2 & x_2 \end{array} \right ], $$

namely

$$B(T_1,T_2)=\sqrt{x_1x_2} \left [ \begin{array}{c@{\quad}c} 1 & \frac{a-b}{a+b } \\ \frac{a-b}{a+b} & 1 \end{array} \right ],\quad \mbox{with } \left\{ \begin{array}{l} a=\sqrt{ (x_1+y_1)(x_2+y_2) } \\ b=\sqrt{(x_1-y_1)(x_2-y_2)} \end{array} \right.. $$

Now consider the following matrices

$$T_1=\left [ \begin{array}{c@{\quad}c} 2 & 1 \\ 1 & 2 \end{array} \right ],\qquad \widetilde{T}_1=\left [ \begin{array}{c@{\quad}c} 4 & -1 \\ -1 & 4 \end{array} \right ], \qquad T_2=\left [ \begin{array}{c@{\quad}c} 2 & -1 \\ -1 & 2 \end{array} \right ], $$

with $\widetilde{T}_{1}\ge T_{1}$. By symbolic computation, one gets that

$$B(\widetilde{T}_1,T_2)=\left [ \begin{array}{c@{\quad}c} 2\sqrt{2} & \sqrt{2}(\sqrt{5}-3) \\ \sqrt{2}(\sqrt{5}-3) & 2\sqrt{2} \end{array} \right ]\not\ge B(T_1,T_2)=\left [ \begin{array}{c@{\quad}c} 2 & 0 \\ 0 & 2 \end{array} \right ], $$

in fact one eigenvalue of $B(\widetilde{T}_{1},T_{2})-B(T_{1},T_{2})$ is $\lambda=\sqrt{10}-2-\sqrt{2}<0$. Thus, we have proved that the Kähler barycenter is not monotonic. Moreover,

$$B(T_1,T_2)\neq(T_1T_2)^{1/2}= \left [ \begin{array}{c@{\quad}c} \sqrt{3} & 0 \\ 0 & \sqrt{3} \end{array} \right ], $$

and hence the Kähler barycenter does not coincide with the Karcher mean for circulant matrices. In particular, it is not a structured geometric mean as defined in Sect. 1.

Observe that in the previous example B(T ₁,T ₂) surprisingly coincides with the arithmetic mean of T ₁ and T ₂. It is not difficult to construct examples where it is not true that B(T ₁,T ₂)≤(T ₁+T ₂)/2 as it should be for a geometric mean.

6 Numerical experiments

In this section, the different algorithms proposed in Sects. 4.2 and 4.3 will be compared w.r.t. speed and accuracy. The numerical experiments are confined to Toeplitz matrices, because of applicational interest in computing their structured matrix mean [29]. These matrices are constructed randomly, but with chosen condition number, using the technique described in [18]. Performance, accuracy and computational distance are subjects of the forthcoming investigations. For clarity we remind the reader that the Richardson-iteration corresponds to a projection technique on a manifold, with the classical Euclidean inner product. For all algorithms, the stopping criteria is based on checking the relative size of the gradient and on comparing two consecutive iteration points.

It is worth pointing out that, in spite of the lack of the proof of uniqueness for structured geometric mean in the Toeplitz case, for any fixed set of data matrices used in our experiments, any initial value and any algorithm yielded always the same structured geometric mean. This suggests the conjecture that in the Toeplitz case there is a unique structured geometric mean.

We have also compared the structured geometric mean obtained by our algorithms with the Kähler metric mean, getting in most experiments a relative difference of the order 10⁻¹, which indicates that these two means are relatively far from each other.

6.1 The projection methods

Performance

The performance of the projection methods explained in Sect. 4 can be compared by looking at both the number of iterations the methods require and the total amount of computational time they need.

In Fig. 3(a), the evolution of the gradient over the iterations is displayed for both techniques (and hence also the number of iterations). Using the projection method introduced in Sect. 4.3 gives a faster decrease of the gradient and results in fewer iteration steps. The number of iterations remains almost constant for this method as the size of the matrices increases. For the projection technique from Sect. 4.2 on the other hand, this number starts to increase when the matrix size grows.

However, comparing expression (4.6) and (4.8), it can be seen that the second one is computationally more expensive and hence the advantage of requiring fewer iterations could be nullified. Therefore, Fig. 3(b) displays the total computational time of both methods for varying sizes of the matrices (both approaches from Sect. 4.4 are shown). The two methods based on Sect. 4.3 maintain an advantage despite their larger computational cost per iteration. Note that for the largest matrix size the computational time of the Euclidean based method appears less than one of the other methods. However, this is caused by the increasing number of iterations required by this Euclidean method. Consequently, the maximum number of iterations is reached before convergence and the algorithms is terminated prematurely. Concerning the operation count in Sect. 4.4, the advantage of the method based on FFT starts to appear when the matrices become sufficiently large.

Accuracy

In order to analyze the accuracy of the projection methods, we implement a high precision version of the first algorithm in Sect. 4.4 using the vpa functionality of Matlab. The relative distance, based on the intrinsic distance (1.2), between this high precision computation and the result of the actual algorithms is shown in Fig. 4. For small condition numbers, the accuracy of all methods is similar in average, but as the condition of the matrices becomes worse, the accuracy of the projection method based on Euclidean geometry deteriorates much faster than that of the method based on the Riemannian geometry. This first method even fails to converge when the condition number of the matrices becomes significantly large. The accuracy of the two approaches in Sect. 4.4 is similar and deteriorates steadily as the condition numbers of the matrices increase.

7 Conclusions

In this article a generalization of the Karcher mean for positive definite matrices to structured positive definite matrices was proposed. Besides a theoretical investigation and adaptation of the desired properties of such a mean, algorithms were proposed. In the design of the algorithms, two trajectories were put forward, one relying mostly on linear algebra, and one based on differential geometry. A convergence analysis has been performed showing the superiority of the algorithm based on differential geometry. Numerical experiments compared the accuracy and speed of the various techniques and confirmed the theoretical analysis.

In the case of Toeplitz matrices, we have considered also the Kähler metric mean [4], whose properties have been investigated, providing an explicit expression in the real case and a quick algorithm in the complex case. For Toeplitz matrices, both the new structured geometric mean and the Kähler metric mean have not completely satisfying properties. In fact they are neither monotone, nor do they satisfy the arithmetic-geometric inequality. We wonder if it is possible to provide a definition of geometric mean for Toeplitz matrices which behaves well with respect to the ordering of positive matrices.

References

Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
MATH Google Scholar
Ando, T., Li, C., Mathias, R.: Geometric means. Linear Algebra Appl. 385, 305–334 (2004)
Article MATH MathSciNet Google Scholar
Barbaresco, F.: Information intrinsic geometric flows. In: Mohammad-Djafari, A. (ed.) Bayesian Inference and Maximum Entropy Methods in Science and Engineering. American Institute of Physics Conference Series, vol. 872, pp. 211–218 (2006)
Google Scholar
Barbaresco, F.: Interactions between symmetric cone and information geometries: Bruhat-tits and Siegel spaces models for high resolution autoregressive Doppler imagery. In: Nielsen, F. (ed.) Emerging Trends in Visual Computing. Lecture Notes in Computer Science, vol. 5416, pp. 124–163. Springer, Berlin (2009)
Chapter Google Scholar
Barbaresco, F.: Robust statistical radar processing in Fréchet metric space: OS-HDR-CFAR and OS-STAP processing in Siegel homogeneous bounded domains. In: Proceedings of IRS’11, International Radar Conference, Leipzig (2011)
Google Scholar
Barbaresco, F.: Information geometry of covariance matrix: Cartan–Siegel homogeneous bounded domains, Mostow/Berger fibration and Fréchet median. In: Bhatia, R., Nielsen, F. (eds.) Matrix Information Geometry, pp. 199–256. Springer, Berlin (2012)
Google Scholar
Batchelor, P.G., Moakher, M., Atkinson, D., Calamante, F., Connelly, A.: A rigorous framework for diffusion tensor calculus. Magn. Reson. Med. 53, 221–225 (2005)
Article Google Scholar
Bhatia, R.: Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2007)
Google Scholar
Bhatia, R., Holbrook, J.: Riemannian geometry and matrix geometric means. Linear Algebra Appl. 413(2–3), 594–618 (2006)
Article MATH MathSciNet Google Scholar
Bhatia, R., Karandikar, R.L.: Monotonicity of the matrix geometric mean. Math. Ann. 353(4), 1453–1467 (2012)
Article MATH MathSciNet Google Scholar
Bini, D.A., Iannazzo, B.: Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)
Article MATH MathSciNet Google Scholar
Bini, D.A., Iannazzo, B.: A note on computing matrix geometric means. Adv. Comput. Math. 35(2–4), 175–192 (2011)
Article MATH MathSciNet Google Scholar
Bini, D., Pan, V.: Polynomial and Matrix Computations. Birkhäuser, Basel (1994)
Book MATH Google Scholar
Bini, D., Meini, B., Poloni, F.: An effective matrix geometric mean satisfying the Ando–Li–Mathias properties. Math. Comput. 79(269), 437–452 (2010)
Article MATH MathSciNet Google Scholar
Bonnabel, S., Sepulchre, R.: Geometric distance and mean for positive semi-definite matrices of fixed rank. SIAM J. Matrix Anal. Appl. 31(3), 1055–1070 (2009)
Article MathSciNet Google Scholar
Bridson, M., Haefliger, A.: Metric Spaces of Non-Positive Curvature. Springer, Berlin (1999)
Book MATH Google Scholar
Cartan, E.: Leçons sur la géométrie des espaces de Riemann. Usp. Mat. Nauk 3(3), 218–222 (1948)
Google Scholar
Chu, M.T., Golub, G.H.: Structured inverse eigenvalue problems. Acta Numer. 11, 1–71 (2002)
Article MATH MathSciNet Google Scholar
Eidelman, Y., Gohberg, I.C.: On a new class of structured matrices. Integral Equ. Oper. Theory 34, 293–324 (1999)
Article MATH MathSciNet Google Scholar
Fiori, S.: Learning the Fréchet Mean over the Manifold of Symmetric Positive-Definite Matrices. Springer, Media (2009)
Google Scholar
Gaubert, S., Qu, Z.: The contraction rate in Thompson metric of order-preserving flows on a cone—application to generalized Riccati equations (2012). arXiv:1206.0448v1 [math.MG]
Higham, N.J.: Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Book Google Scholar
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Book MATH Google Scholar
Iannazzo, B.: The geometric mean of two matrices from a computational viewpoint (2011). arXiv:1201.0101v1 [math.NA]
Jeuris, B., Vandebril, R., Vandereycken, B.: A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)
MathSciNet Google Scholar
Karcher, H.: Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30(5), 509–541 (1977)
Article MATH MathSciNet Google Scholar
Kendall, W.: Probability, convexity, and harmonic maps with small image I: uniqueness and fine existence. Proc. Lond. Math. Soc. 61(2), 371–406 (1990)
Article MATH MathSciNet Google Scholar
Lang, S.: Fundamentals of Differential Geometry. Graduate Texts in Mathematics. Springer, Berlin (1999)
Book MATH Google Scholar
Lapuyade-Lahorgue, J., Barbaresco, F.: Radar detection using Siegel distance between autoregressive processes, application to HF and X-band radar. In: IEEE Radar Conference, 2008. RADAR’08, Rome (2008)
Google Scholar
Lawson, J., Lim, Y.: The geometric mean, matrices, metrics, and more. Am. Math. Mon. 108(9), 797–812 (2001)
Article MATH MathSciNet Google Scholar
Lawson, J., Lim, Y.: Monotonic properties of the least squares mean. Math. Ann. 351(2), 267–279 (2011)
Article MATH MathSciNet Google Scholar
Lim, Y.: Riemannian and Finsler structures of symmetric cones. Trends Math. 4(2), 111–118 (2001)
Google Scholar
Moakher, M.: A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 26(3), 735–747 (2005)
Article MATH MathSciNet Google Scholar
Moakher, M.: On the averaging of symmetric positive-definite tensors. J. Elast. 82(3), 273–296 (2006)
Article MATH MathSciNet Google Scholar
Nakamura, K.: Geometric means of positive operators. Kyungpook Math. J. 49, 167–181 (2009)
Article MATH MathSciNet Google Scholar
Pálfia, M.: The Riemann barycenter computation and means of several matrices. Int. J. Comput. Math. Sci. 3(3), 128–133 (2009)
MathSciNet Google Scholar
Qi, C.: Numerical optimization methods on Riemannian manifolds. Ph.D. thesis, Florida State University, College of arts and sciences (2011)
Rentmeester, Q., Absil, P.-A.: Algorithm comparison for Karcher mean computation of rotation matrices and diffusion tensors. In: 19th Conference on European Signal Processing, 2011, EUSIPCO (2011)
Google Scholar
Thompson, A.C.: On certain contraction mappings in a partially ordered vector space. Proc. Am. Math. Soc. 14(3), 438–443 (1963)
MATH Google Scholar
Vandebril, R., Van Barel, M., Mastronardi, N.: Matrix Computations and Semiseparable Matrices. Linear Systems, vol. 1. Johns Hopkins University Press, Baltimore (2008)
Google Scholar
Yang, L.: Medians of probability measures in Riemannian manifolds and applications to radar target detection. Ph.D. thesis, Université de Poitiers (2011)
Yang, L., Arnaudon, M., Barbaresco, F.: Geometry of covariance matrices and computation of median. In: AIP Conference Proceedings, 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, vol. 1305, pp. 479–486 (2011)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the referees for many insightful comments which improved the presentations of the paper. In particular, they are indebted with a referee which provided the elegant proof for Theorem 2.1.

Author information

Authors and Affiliations

Università di Pisa, Pisa, Italy
Dario A. Bini
Università di Perugia, Perugia, Italy
Bruno Iannazzo
KU Leuven, Leuven, Belgium
Ben Jeuris & Raf Vandebril

Authors

Dario A. Bini
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Iannazzo
View author publications
You can also search for this author in PubMed Google Scholar
Ben Jeuris
View author publications
You can also search for this author in PubMed Google Scholar
Raf Vandebril
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raf Vandebril.

Additional information

Communicated by Peter Benner.

This work was partially supported by MIUR grant number 2002014121; by the Research Council KU Leuven, projects OT/11/055 (Spectral Properties of Perturbed Normal Matrices and their Applications), CoE EF/05/006 Optimization in Engineering (OPTEC); by the Fund for Scientific Research–Flanders (Belgium) project G034212N (Reestablishing Smoothness for Matrix Manifold Optimization via Resolution of Singularities); and by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office, Belgian Network DYSCO (Dynamical Systems, Control, and Optimization).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bini, D.A., Iannazzo, B., Jeuris, B. et al. Geometric means of structured matrices. Bit Numer Math 54, 55–83 (2014). https://doi.org/10.1007/s10543-013-0450-4

Download citation

Received: 31 December 2012
Accepted: 24 September 2013
Published: 16 October 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10543-013-0450-4

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Geometric means of structured matrices

Abstract

Similar content being viewed by others

The expanding universe of the geometric mean

Following the Trail of the Operator Geometric Mean

Computational aspects of the geometric mean of two matrices: a survey

1 Introduction

Example 1.1

2 Existence of structured geometric means

2.1 Uniqueness of the Karcher mean for positive matrices

Example 2.1

2.2 Existence of structured geometric means on a closed set

Theorem 2.1

Proof

Example 2.2

3 A theoretical exploration of the structured geometric mean

3.1 The geometric and structured geometric mean relation

3.2 Properties of the geometric mean conveyed to the structured mean setting

Example 3.1

3.3 The structured mean as solution(s) of a vector equation

4 Algorithms for structured geometric means in the linear case

4.1 A preconditioned functional iteration and its convergence

Lemma 4.1

Proof

Theorem 4.1

4.2 An elementary preconditioner

Lemma 4.2

Proof

4.3 A preconditioner based on a differential geometric viewpoint

4.4 The case of Toeplitz matrices

5 Kähler metric mean for Toeplitz matrices

5.1 Properties of the Kähler barycenter

Example 5.1

6 Numerical experiments

6.1 The projection methods

Performance

Accuracy

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation