1 Introduction

The wish to generalize the concept of a geometric mean to positive definite (positive for short) matrices and, on the other hand, the need to average quantities expressed by positive matrices in certain applications have led to the definition and the study of the Karcher mean [8, 9, 33].

Without describing all facets, one can consider the set of n×n positive matrices, denoted by \(\mathcal{P}_{n}\), as a manifold [1], in particular, there is a diffeomorphism from \(\mathcal{P}_{n}\) to \(\mathbb {R}^{n^{2}}\). In each point of \(A\in\mathcal{P}_{n}\) one can define the tangent space \(T_{A}\mathcal{P}_{n}\), which can be identified with the space of Hermitian matrices. The Karcher mean can now be defined in terms of a Riemannian geometry defined on \(\mathcal{P}_{n}\) and induced by the inner product

$$ g_A(X,Y):=\operatorname{tr} \bigl(A^{-1}XA^{-1}Y \bigr),\quad X,Y\in T_A\mathcal{P}_n $$
(1.1)

on the tangent space \(T_{A}\mathcal{P}_{n}\). This inner product g A makes \(\mathcal{P}_{n}\) a complete Riemannian manifold with non-positive curvature and yields the following distance between two matrices \(A,B\in\mathcal{P}_{n}\):

$$ \delta(A,B)={ \Biggl(\sum_{k=1}^n \log^2 \lambda_k \Biggr)^{1/2}}, $$
(1.2)

where λ 1,…,λ n , are the eigenvalues of A −1 B, which are positive numbers (for all the proofs see [8, Ch. 6]). The Karcher mean of a set of m positive matrices, \(A_{1},\ldots,A_{m}\in\mathcal{P}_{n}\), is defined as the unique positive minimizer G(A 1,…,A m ) of the function

$$ f(X;A_1,\ldots,A_m):=\sum _{j=1}^m \delta^2(X,A_j). $$
(1.3)

Since this mean minimizes the sum of squared intrinsic distances to each of the matrices A j it is a barycenter of these matrices with respect to the aforementioned metric.

An important feature of the Karcher mean is that it possesses all the properties desired by a geometric mean, like the ten Ando-Li-Mathias (ALM) axioms [2]. For this reason, it is a viable tool in applications requiring some of these properties [7, 34]. A geometric mean should for instance be: permutation invariant, monotone, joint concave, and should satisfy the arithmetic-geometric-harmonic inequality (see [2] for the precise statements of the properties). In particular, one of the most characteristic properties of a geometric mean is its invariance under inversion:

$$ G\bigl(A_1^{-1},\ldots,A_m^{-1} \bigr)=G(A_1,\ldots,A_m)^{-1}. $$
(1.4)

Prior to having the proofs of all the properties of the Karcher mean, some of which are very elusive [10, 31], other definitions of a matrix geometric mean had been proposed [2, 12, 14, 35], even if nowadays there is large agreement in considering the Karcher mean as the “right” matrix geometric mean.

In certain applications, however, besides the positive definiteness, the data matrices have some further structure in the sense that they belong to some special subset \(\mathcal{S}\), say a linear space. For instance, in the design and analysis of certain radar systems, the matrices to be averaged are correlation matrices, which are positive Toeplitz matrices or are positive Toeplitz block matrices with Toeplitz blocks [5, 6, 29, 41]. In these cases, one would like the geometric mean to belong to the same class \(\mathcal{S}\) as the data. Unfortunately, the Karcher mean does not preserve many structures, in particular the Karcher mean of Toeplitz and/or band matrices is typically not of Toeplitz and/or band form anymore, as illustrated by the following simple example.

Example 1.1

Let \(\mathcal{T}\) be the set of tridiagonal Toeplitz matrices and choose \(A_{1},A_{2}\in\mathcal{T}\) where A 1=I, and \(A_{2}=\operatorname{tridiag}(1,2,1)\) is the matrix with 2’s on the main, and 1’s appearing on sub- and superdiagonals. We have A 1 A 2=A 2 A 1, thus the Karcher mean equals (A 1 A 2)1/2. For n=3 we get

$$ (A_1A_2)^{1/2}=\frac{\sqrt{2}}{4} \left [ \begin{array}{c@{\quad}c@{\quad}c} \sqrt{2+\sqrt{2}}+2 & \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}}-2\\ \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}} & \sqrt{2}\sqrt{2-\sqrt{2}}\\ \sqrt{2+\sqrt{2}}-2 & \sqrt{2}\sqrt{2-\sqrt{2}} & \sqrt{2+\sqrt{2}}+2 \\ \end{array} \right ] $$
(1.5)

which is neither tridiagonal nor Toeplitz.

In this paper we introduce the concept of a structured geometric mean of positive matrices in such a way that if \(A_{1},\ldots,A_{m}\in\mathcal{S}\) also their mean belongs to \(\mathcal{S}\). Given a subset \(\mathcal{S}\) of \(\mathcal{P}_{n}\) and matrices \(A_{1},\ldots,A_{m}\in\mathcal{S}\), we say that \(G\in\mathcal{S}\) is a structured geometric mean with respect to \(\mathcal{S}\) of A 1,…,A n if the function f(X;A 1,…,A m ) of (1.3) takes its minimum value over \(\mathcal{S}\) at G. The set of all structured geometric means of A 1,…,A m (all minimizers of f) with respect to \(\mathcal{S}\) is denoted by \(G_{\mathcal{S}}=G_{\mathcal{S}}(A_{1},\ldots,A_{m})\).

We show that if \(\mathcal{S}\) is closed (and nonempty) then \(G_{\mathcal{S}}\) is nonempty and the matrices \(G\in G_{\mathcal{S}}\) satisfy most of the ALM axioms in a suitably adjusted form. For instance, the invariance under inversion property (1.4) turns into

$$G_{\mathcal{S}}(A_1,\ldots,A_m)=G_{\mathcal{S}^{-1}} \bigl(A_1^{-1},\ldots,A_m^{-1} \bigr)^{-1}, $$

where for a set \(\mathcal{U}\subseteq\mathcal{P}_{n}\) we denote \(\mathcal{U}^{-1}= \{X^{-1}: X\in\mathcal{U}\}\). That is, the inverse of any structured geometric mean of the matrices \(A_{1},\ldots,A_{m}\in\mathcal{S}\) with respect to \(\mathcal{S}\) coincides with a structured mean of the inverses \(A_{1}^{-1},\ldots, A_{m}^{-1}\) with respect to the set \(\mathcal{S}^{-1}\) where these inverses reside.

Moreover, we show that, in many interesting cases, structured geometric means can be characterized in terms of the positive solutions of a suitable vector equation and provide algorithms for their computation.

In the Toeplitz case we also consider a different approach, where the mean is defined as a barycenter for a suitable metric on the manifold [4]. We analyze this barycenter and its properties in detail, obtaining an explicit expression in the real case and a quick algorithm in the complex case.

The article is organized as follows. In Sect. 2 the cost function (1.3) is examined with special focus on the existence of the minimizer over a closed set. The structured matrix mean itself is the subject of study in Sect. 3, where the theoretical properties it should satisfy are examined. Section 4 proposes two algorithms for computing a structured mean G in a linear space together with their convergence analysis. For one algorithm, it is shown that the convergence speed is independent of the condition number of the mean and is faster when the condition numbers of the matrices \(A_{i}^{-1/2}GA^{-1/2}_{i}\) are smaller, for i=1,…,n. Because of its nature and its convergence properties, this algorithm can be viewed as the natural extension to the structured case of the Richardson-like algorithm introduced and analyzed in [11] for the computation of the Karcher mean of unstructured matrices. In Sect. 5, for Toeplitz matrices, a different structured matrix mean [4] as a barycenter is considered, and an algorithm for computing it is developed. Section 6 shows numerical experiments related to accuracy and speed for computing the structured matrix mean.

Here we recall some basic notation and properties that will be used in the rest of the paper. Given a matrix A, we define σ(A) the spectrum of A, that is, the set of all the eigenvalues of A, and ρ(A)=max λσ(A)|λ| the spectral radius of A. Moreover we denote by \(\|A\|_{F}:=(\operatorname{trace}(A^{*}A))^{1/2}= (\sum_{i,j} |a_{ij}|^{2})^{1/2}\) the Euclidean (Frobenius) norm of A, and ∥A s =ρ(A A)1/2 is the spectral norm. By A we denote the transposed conjugate of A. Recall that for a positive matrix A there exists a unique positive solution to the equation X 2=A. This solution, denoted by A 1/2, is called the square root of A [8]. Given a matrix \(A\in\mathbb{C}^{n\times n}\), we use the \(\operatorname {vec}\)-operator to build \(\operatorname{vec}(A)\in\mathbb{C}^{n^{2}}\), a long vector obtained by stacking the columns of A. We will use the Kronecker product ⊗ such that AB is the block matrix whose (i,j)th block is defined as a ij B. The vec operator and the Kronecker product interplay in the following way [23]

$$ \operatorname{vec}(ABC)=\bigl(C^T \otimes A\bigr) \operatorname{vec}(B). $$
(1.6)

Finally, we recall a natural partial order in \(\mathcal{P}_{n}\) that will be used in the following: let A and B be positive, we write AB if the matrix AB is semidefinite positive.

2 Existence of structured geometric means

In this section the existence of a structured geometric mean and its relation to the classical Karcher mean is studied. First some necessities are repeated.

2.1 Uniqueness of the Karcher mean for positive matrices

The Riemannian geometry on \(\mathcal{P}_{n}\) given by the inner product (1.1) turns out to be complete and a parametrization of the geodesic joining two positive matrices A and B is known to be [8, 30]

$$ A\#_t B=A^{1/2} \bigl(A^{-1/2}BA^{-1/2} \bigr)^tA^{1/2}=A \bigl(A^{-1}B \bigr)^t,\quad t\in[0,1], $$
(2.1)

where the midpoint A#1/2 B coincides with the geometric mean of the two matrices [24, 30].

Given a set of matrices \(A_{1},\ldots,A_{m}\in\mathcal{P}_{n}\), the function f(X)=f(X;A 1,…,A m ) in (1.3) is strictly geodesically convex, which means that for any two different matrices \(X, Y\in\mathcal{P}_{n}\), we have

$$ f(X\#_t Y)< (1-t)f(X)+tf(Y),\quad 0<t<1. $$
(2.2)

This property follows from [8, Exercise 6.1.13], where it is stated that for m=1 the function f(X) is strictly geodesically convex. The case m>1 follows by summing up the m inequalities obtained by applying (2.2) to the functions f(X)=f(X;A i ), for i=1,…,m, respectively.

Geodesical convexity is a key ingredient for the proof of the existence of a unique minimizer of f over \(\mathcal{P}_{n}\) given in [8, Ch. 6]. A different proof is obtained using the fact that \(\mathcal{P}_{n}\), with the inner product (1.1), forms a Cartan–Hadamard manifold [16, 28, 30, 32], which is a Riemannian manifold, complete, simply connected and with non-positive sectional curvature everywhere. On such a Cartan–Hadamard manifold the Karcher mean (the so-called center-of-mass) exists and is unique [17, 27, 28].

The notion of geodesical convexity in \(\mathcal{P}_{n}\) is different from the customary convexity in the Euclidean space where one requires that

$$f\bigl((1-t)X+tY\bigr)\le(1-t)f(X)+tf(Y),\quad t\in[0,1]. $$

In fact, the function f is not convex in the traditional sense as the following example shows.

Example 2.1

Consider the set made of the unique matrix A=1, and \(x,y\in\mathbb{R}_{+}^{*}=\mathcal{P}_{1}\). We have f(x)=δ 2(x,A)=log2(x) which is not convex. On the other hand the function log2(x) is strictly geodesically convex and this can be shown by an elementary argument: in fact, it is continuous and for xy

$$\begin{aligned} \delta^2(\sqrt{xy},1) =&\log^2(\sqrt{xy})= \frac{1}{4} \bigl(\log^2x+\log^2y+2\log x\log y \bigr) \\ =&\frac{1}{2} \bigl(\log^2 x+\log^2y \bigr)-\frac{1}{4} (\log x-\log y )^2 \\ <&\frac{1}{2} \bigl(\log^2 x+\log^2 y \bigr)= \frac{1}{2} \bigl(\delta^2(x,1)+\delta^2(y,1) \bigr). \end{aligned}$$

Iterative selection of midpoints and a continuity argument completes the proof.

Since f is strictly geodesically convex, it can be proved that it has a unique minimizer over any closed, geodesically convex subset \(\mathcal{S}\) of \(\mathcal{P}_{n}\), where we say that a subset \(\mathcal{S}\subseteq \mathcal{P}_{n}\) is geodesically convex if for any \(X,Y\in\mathcal{S}\), the entire geodesic X# t Y, t∈[0,1] belongs to \(\mathcal{S}\). Indeed, if X 1 and X 2 were two different matrices in \(\mathcal{S}\) where f takes its minimum, then from (2.2) it would follow that f(X 1# t X 2)<f(X 1)=f(X 2) for any 0<t<1 which contradicts the assumption.

2.2 Existence of structured geometric means on a closed set

For a generic closed subset \(\mathcal{U}\) of \(\mathcal{P}_{n}\), which is not necessarily geodesically convex, we can prove the existence of a minimum point by using the fact that f(X) is continuous.

Theorem 2.1

Let \(\mathcal{U}\subseteq\mathcal{P}_{n}\) be a closed subset. Then for any \(A_{1},\ldots,A_{m}\in\mathcal{P}_{n}\) the function f(X)=f(X;A 1,…,A m ) has a minimum in \(\mathcal{U}\).

Proof

Consider the distance measures δ(X,Y), given in (1.2), which can be written also as δ(X,Y)=∥log(Y −1/2 XY −1/2)∥ F , and d(X,Y) (the Thompson metric [39]), given by

$$d(X,Y) = \bigl\Vert\log\bigl(Y^{-1/2} X Y^{-1/2}\bigr) \bigr\Vert_s. $$

From the inequality between the Frobenius and the spectral norm, we have

$$d(X,Y) \leq\delta(X,Y) \leq\sqrt{n} d(X,Y) $$

for all \(X,Y\in\mathcal{P}_{n}\). We also define the order interval [X,Y] as

$$[X,Y] = \{ Z\in\mathcal{P}_n: X\leq Z\leq Y \}, $$

for all \(X,Y\in\mathcal{P}_{n}\) with XY [21]. The order interval [e r X,e r X] turns out to be the closed ball of radius r>0 around X with respect to the Thompson metric, thus the order interval [e r I,e r I] is a compact subset of \(\mathcal{P}_{n}\).

Next, consider the set \(Q_{t} = \{ X \in\mathcal{P}_{n}: f(X)\leq t \}\). Because of the continuity of f, each Q t is closed in \(\mathcal{P}_{n}\). The definition of f also guarantees the boundedness of Q t for each t with respect to the Riemannian metric δ. Since we have the above inequality of the distance measures, this implies that each Q t is also bounded with respect to the Thompson metric. Hence it is possible to find some r such that Q t is contained in the closed ball of radius r around I with respect to the Thompson metric, that is Q t ⊂[e r I,e r I]. Since Q t is now a closed subset of a compact set (in \(\mathcal{P}_{n}\)), it is also compact in \(\mathcal{P}_{n}\).

Finally, define \(B_{t} = Q_{t} \cap\mathcal{U}\), which is a again compact subset in \(\mathcal{P}_{n}\) (as the intersection of a compact and a closed set). The collection of all nonempty B t is a descending family of nonempty, compact subsets of \(\mathcal{P}_{n}\), which has a nonempty intersection. The elements in this intersection are the minimizers of f in \(\mathcal{U}\). □

In general, uniqueness of the point where f(X) takes its minimum cannot be guaranteed. For instance, if both A and A −1 belong to \(\mathcal{U}\) while I=A#1/2 A −1 does not, then the function f 1(X):=δ 2(X,A)+δ 2(X,A −1) reaches its minimum at a point \(I\ne G \in\mathcal{U}\). Clearly, f 1(G −1)=f 1(G) and if G −1G belongs to \(\mathcal{U}\), then we have at least two distinct points of minimum. A more concrete example is the following.

Example 2.2

Consider the 2×2 matrices A=I and , where a>1. Define the segment \(\mathcal{U}= \{G(t)=A+t(B-A), t\in [0,1] \}\), which is closed and convex, but not geodesically convex. The function f(t)=δ 2(G(t),A)+δ 2(G(t),B) takes the form f(t)=log2((1−t)/a+t)+log2(a(1−t)+t)+log2((1−t)+t/a)+log2((1−t)+at) and is symmetric with respect to t=1/2. For a=200 the function has the graph shown in Fig. 1 with a local maximum at t=1/2 and two global minima close to the edges of the segment.

Fig. 1
figure 1

Graph of f(t)=δ 2(G(t),A)+δ 2(G(t),B) for G(t)=A+t(BA) with A=I and \(B=\operatorname{diag}(200,1/200)\)

3 A theoretical exploration of the structured geometric mean

In this section we discuss the relation between the structured and generic geometric mean, together with the adaptation of the generic properties to the structured setting. We will discuss just the real case, so in this section, the set \(\mathcal{P}_{n}\) stands for the manifold of real positive definite matrices whose tangent space is the set of real and symmetric matrices.

3.1 The geometric and structured geometric mean relation

The properties shown in Sect. 2 imply that a structured geometric mean with respect to \(\mathcal{U}\), as defined in the introduction, always exists for any closed subset \(\mathcal{U}\) of \(\mathcal{P}_{n}\). In particular, this holds in the cases where \(\mathcal{U}=\mathcal{S}\cap\mathcal{P}_{n}\) for any linear space \(\mathcal{S}\) of matrices and also for \(\mathcal{U}^{-1}:=\mathcal{S}^{-1}\cap \mathcal{P}_{n}\) where we define \(\mathcal{S}^{-1}= \{A^{-1}: A\in\mathcal{S}, \det A\ne0 \}\). This captures a wide class of interesting structures emerging in applications, e.g., Toeplitz and band matrices, as well as their inverses. For simplicity we will restrict our analysis in the remainder of the article to the real case.

More general structures are given in terms of a parametrization \(\sigma (t):\mathcal{V}\to \mathbb{R}^{n\times n}\), with σ a differentiable function defined in the open subset \(\mathcal{V}\) of \(\mathbb{R}^{q}\), which we will call the parameter space. The set \(\mathcal{T}=\sigma(\mathcal{V})\) is the structure determined by σ. If σ is linear and \(\mathcal{V}=\mathbb{R}^{q}\), then \(\mathcal{T}\) is a linear space. Examples of sets \(\mathcal{T}\) of interest which generally do not form a linear space are the set of matrices with a given displacement rank [13], the set of semiseparable [40], and quasiseparable matrices [19]. For an n×n symmetric Toeplitz matrix, a possible parametrization is given by

$$ \sigma(t)=\sigma\bigl([t_0,t_1, \ldots,t_{n-1}]\bigr)=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} t_0&t_1 &\ldots&t_{n-1}\\ t_1&\ddots&\ddots&\vdots\\ \vdots&\ddots&\ddots&t_1\\ t_{n-1}&\ldots&t_1&t_0 \end{array} \right ]. $$
(3.1)

For a band matrix, one can, e.g., just store the nonzero-elements in a long vector and map them onto their exact locations. In the following, given a closed set \(\mathcal{T}\) we let \(\mathcal{U}=\mathcal{T}\cap\mathcal{P}_{n}\).

In Example 2.2 we illustrated that the minimum of the cost function restricted to a closed subset \(\mathcal{U}\subseteq \mathcal{P}_{n}\) is not necessarily unique. For this reason, we consider the structured geometric mean \(G_{\mathcal{U}}=G_{\mathcal{U}}(A_{1},\ldots,A_{m})\) of \(A_{1},\ldots,A_{m}\in\mathcal{U}\) as the set of matrices in \(\mathcal{U}\) where the function f(X) attains its minimum. Formally speaking, for \(A_{1},\ldots,A_{m}\in\mathcal{U}\), let \(g\in\mathbb{R}^{q}\) be such that \(\widehat{G}=\sigma(g)\in G_{\mathcal{U}}(A_{1},\ldots,A_{m})\), then

$$f\bigl(\sigma(g);A_1,\ldots,A_m\bigr)=\min _{t\in\mathbb{R}^q}f\bigl(\sigma (t);A_1,\ldots,A_m \bigr). $$

Since \(\mathcal{U}\subseteq\mathcal{P}_{n}\), the minimum over \(\mathcal{P}_{n}\) is less than or equal to the minimum over \(\mathcal{U}\). In general it will often happen that \(\widehat{G}\ne G(A_{1},\ldots,A_{m})\) like in (1.5).

3.2 Properties of the geometric mean conveyed to the structured mean setting

Some desired properties for a matrix geometric mean were stated by Ando, Li and Mathias in [2], of which the most noticeable are enlisted here. The other properties include joint concavity, the determinental identity, and continuity from above.

Consistency with scalars :

If A 1,…,A m commute, then

$$G(A_1,\ldots,A_m)=(A_1\cdots A_m)^{1/m}. $$
Permutation invariance :

For any permutation π of {1,…,m},

$$G(A_1,\ldots,A_m)=G (A_{\pi(1)}, \ldots,A_{\pi(m)} ). $$
Joint homogeneity :
$$G(\alpha_1 A_1,\alpha_2 A_2, \ldots ,\alpha_m A_m)=(\alpha_1\cdots \alpha_m)^{1/m}G(A_1,\ldots,A_m). $$
Monotonicity :

If \(A_{i} \geq A'_{i}\), for i=1,…,m, then

$$G(A_1,\ldots,A_m) \geq G\bigl(A'_1, \ldots,A'_m\bigr). $$
Invariance under congruence :

For any nonsingular M,

$$G\bigl(M^*A_1M,\ldots,M^*A_mM\bigr)=M^*G(A_1, \ldots,A_m)M. $$
Invariance under inversion :
$$G(A_1,\ldots,A_m)^{-1}=G\bigl(A_1^{-1}, \ldots,A_m^{-1}\bigr). $$
Arithmetic-geometric-harmonic mean inequality :
$$\frac{1}{m}(A_1+\cdots+A_m)\ge G(A_1, \ldots,A_m)\ge m \bigl({A_1^{-1}+ \cdots+A_m^{-1}} \bigr)^{-1}. $$

Yet another property naturally desired of a geometric mean, but not required in the list of Ando, Li and Mathias, is the repetition invariance, that is, for any set of positive matrices \(A_{1},\ldots,A_{m}\in\mathcal{P}_{n}\),

$$ G(A_1,\ldots,A_m,A_1, \ldots,A_m)=G(A_1,\ldots,A_m). $$
(3.2)

Now, we consider the properties of the structured geometric mean. Some properties such as the permutation invariance trivially hold, others should be restated. In fact, in the generic case the structures we consider are neither invariant under inversion nor under congruence. That is because if \(A\in\mathcal{U}\) then it need not necessarily hold that \(A^{-1}\in\mathcal{U}\) or \(M^{*}AM\in\mathcal{U}\).

We start with the invariance under inversion as this is one of the most characteristic properties of the geometric mean. To this end we consider the set \(\mathcal{T}^{-1}= \{T^{-1}: T\in\mathcal{T},~\det T\ne0 \}\) parametrized with the function σ −1(t):=σ(t)−1. Clearly, the intersection \(\mathcal{U}\) of \(\mathcal{T}\) with \(\mathcal{P}_{n}\) yields always invertible matrices, so that \(\mathcal{T}^{-1}\cap\mathcal{P}_{n}=\mathcal{U}^{-1}\).

According to our definition, the structured geometric mean of \(A_{1}^{-1},\ldots,A_{m}^{-1}\in\mathcal{U}^{-1}\) is given by the set \(G_{\mathcal{U}^{-1}}(A_{1}^{-1},\ldots,A_{m}^{-1})\). For any \(\widetilde{G}\in G_{\mathcal{U}^{-1}}\), we have \(\widetilde{G}=\sigma(\widetilde{g})^{-1}\) such that

$$f \bigl(\sigma (\widetilde{g} )^{-1};A_1^{-1}, \ldots ,A_m^{-1} \bigr)=\min_{t\in\mathbb{R}^q} f \bigl(\sigma(t)^{-1};A_1^{-1}, \ldots,A_m^{-1} \bigr). $$

Since δ(A,B) = δ(A −1,B −1), one gets \(f(X;A_{1},\ldots,A_{m})\,{=}\,f(X^{-1};A_{1}^{-1},\ldots,A_{m}^{-1})\) so that

$$f \bigl(\sigma(\widetilde{g});A_1,\ldots,A_m \bigr)=\min _{t\in \mathbb{R}^q}f \bigl(\sigma(t);A_1,\ldots,A_m \bigr) $$

and thus \(\widetilde{G}^{-1}\in G_{\mathcal{U}}(A_{1},\ldots,A_{m})\). Since \(\widetilde{G}\) was chosen arbitrarily, and since \(\mathcal{U}\) can be interchanged with \(\mathcal{U}^{-1}\), we have the analogue of the invariance under inversion for the structured geometric mean:

$$ G_{\mathcal{U}}(A_1,\ldots,A_m)^{-1}=G_{\mathcal{U}^{-1}} \bigl(A_1^{-1},\ldots, A_m^{-1} \bigr). $$
(3.3)

In a similar manner we can restate the invariance under congruence in a structured style by defining, for any nonsingular M, the set \(\mathcal{U}_{M}:=M^{*}\mathcal{U}M=\{M^{*}TM: T\in\mathcal{U}\}\). The invariance under congruence is then understood as

$$G_{\mathcal{U}_M}\bigl(M^*A_1M,\ldots,M^*A_mM \bigr)=M^*G_{\mathcal{U}}(A_1,\ldots , A_m)M. $$

Joint homogeneity, in order to be defined, requires that the set \(\mathcal{T}\) satisfies the following property:

$$A\in\mathcal{T}\quad\Rightarrow\quad\alpha A\in\mathcal{T} $$

for any scalar α>0. This property clearly holds if \(\mathcal{T}\) is a linear space or the set formed by the inverses of the nonsingular matrices of a linear space. For these sets, the joint homogeneity holds.

Repetition invariance holds true as well by (1.3), since

$$f(X;A_1,\ldots,A_m,A_1, \ldots,A_m)=2f(X;A_1,\ldots,A_m), $$

so the minimizers (over a subset) of the functions f(X;A 1,…,A m ,A 1,…,A m ) and f(X;A 1,…,A m ) are the same.

Regarding the remaining properties, we observe that the consistency with scalars is violated, as Example 1.1 shows. Nevertheless, weaker consistency properties hold, such as idempotency, namely \(G_{\mathcal{U}}(A,A,\ldots,A)=A\) for each structure \(\mathcal{U}\) and \(A\in\mathcal{U}\). Moreover, if the set \(\mathcal{U}\) is closed and geodesically convex then

$$G_{\mathcal{U}}(A_1,\ldots,A_m)=G(A_1, \ldots,A_m), $$

so the geometric and structured geometric mean coincide. An interesting case of geodesically convex set is given by \(\mathcal{U}=\mathcal{T}\cap \mathcal{P}_{n}\), when \(\mathcal{T}\) is an algebra, i.e., a linear space closed under multiplication and inversion.

Finally, the properties related to the ordering of positive matrices such as monotonicity are not true as shown by the following numerical example.

Example 3.1

We consider the four Toeplitz matrices

$$\begin{aligned} &T_1= \left [\begin{array}{c@{\quad}c@{\quad}c} 1 & 1/2 & 1/2 \\ 1/2 & 1 & 1/2 \\ 1/2 & 1/2 & 1\\ \end{array} \right ] , \qquad T_2=T_1, \\ &T_3= \left [\begin{array}{c@{\quad}c@{\quad}c} 3/4 & 1/2 & 0 \\ 1/2 & 3/4 & 1/2 \\ 0 & 1/2 & 3/4 \\ \end{array} \right ] ,\qquad S= \left [\begin{array}{c@{\quad}c@{\quad}c} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 1\\ \end{array} \right ] , \end{aligned}$$

and, using the algorithms presented in the next sections, we compute a structured geometric mean G ε of the three matrices T 1,T 2 and T 3+εS for various ε≥0. The norm of G ε G 0 becomes small as ε tends to 0 and we observe that G ε G 0 is not positive (semi)definite, while T 3+εST 3. This gives numerical evidence of the lack of monotonicity of a structured geometric mean. On the other hand, computing the arithmetic mean A of T 1,T 2 and T 3, one observes also that the expected inequality AG 0 does not hold in this case.

3.3 The structured mean as solution(s) of a vector equation

We start from the Karcher mean, which is obtained as the unique solution in \(\mathcal{P}_{n}\) of the matrix equation

$$ \sum_{i=1}^m\log \bigl(XA_i^{-1} \bigr) = 0. $$
(3.4)

Equation (3.4) is obtained using the fact that f is differentiable and has a minimum at the Karcher mean. Thus the Karcher mean satisfies the condition ∇f X =0, where \(\nabla f_{X}=2X^{-1}\sum_{i=1}^{m} \log (XA_{i}^{-1} )\) denotes the (Euclidean) gradient of f with respect to X (see [26, 33]). We remark already that a different metric will be studied in Sect. 4.3.

In the general case, the restriction of f to a structure given by σ(t) is investigated. For any minimum g (with corresponding σ(g)) not located at the boundary of the parameter space, the gradient ∇(fσ) t of the function with respect to t must be zero, so we are interested in the solutions of the vector equation ∇(fσ) t =0.

From the chain rule of derivation, one obtains that

$$\nabla(f\circ\sigma)_t= \biggl(\sum_{i,j} \frac{\partial f(\sigma (t))}{\partial x_{i,j}}\frac{d \sigma_{i,j}(t)}{d t_s} \biggr)_{s=1,\ldots,q}=0 $$

which leads to the vector equation

$$ \sum_{i,j} \bigl(\varGamma\bigl(\sigma(t) \bigr) \bigr)_{i,j}\frac{d \sigma _{i,j}(t)}{d t_s}=0,\quad s=1,\ldots,q, $$
(3.5)

where \(\varGamma(X):=\frac{1}{2}\nabla f_{X}\).

In the case where \(\mathcal{T}\) is a linear space, the function σ(t) is linear and can be written in matrix form as

$$\operatorname{vec}\bigl(\sigma(t)\bigr)=Ut,\quad U\in\mathbb{R}^{n^2\times q}, $$

so that (3.5) turns into

$$ U^T\operatorname{vec}\bigl(\varGamma\bigl(\sigma(t) \bigr)\bigr)=0,\qquad\varGamma (X)=X^{-1}\sum _{i=1}^m \log \bigl(XA_i^{-1} \bigr). $$
(3.6)

If σ(t) is chosen to be orthogonal, i.e. such that U T U=I, then U T coincides with the Moore–Penrose inverse of U.

When \(\mathcal{T}\) denotes the set of symmetric Toeplitz matrices, the parametrization (3.1) leads to a matrix U having orthogonal columns. In fact one has \(U^{T}U=D=\operatorname{diag}(n,2(n-1),2(n-2),\ldots,2)\). In particular, for n=3 one has

figure a

For \(\mathcal{T}\) being the set of symmetric tridiagonal matrices the parametrization

$$\sigma(t)=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} t_1&t_{n+1}\\ t_{n+1}&t_2&t_{n+2}\\ &\ddots&\ddots&\ddots\\ & &t_{2n-2}&t_{n-1}&t_{2n-1}\\ & & &t_{2n-1}&t_n \end{array} \right ] $$

also leads to a matrix U having orthogonal columns. Moreover, \(U^{T}U=\operatorname{diag}(I_{n}, 2I_{n-1})\). For n=3, e.g., one has

figure b

4 Algorithms for structured geometric means in the linear case

We will give two algorithms for computing structured geometric means when they are characterized in terms of the solutions g of a vector equation, as, for instance, in the linear case.

We first provide a general definition of a class of algorithms based on a preconditioned functional iteration, then we specialize to two algorithms given by two different preconditioners.

The first, provided in Sect. 4.2 is derived by relying on the projection of the gradient with respect to the Euclidean scalar product. The second, presented in Sect. 4.3, is obtained through projection with respect to the Riemannian metric of \(\mathcal{P}_{n}\) decribed in Sect. 1.

4.1 A preconditioned functional iteration and its convergence

Throughout this section we assume that \(A_{1},\ldots,A_{m}\in\mathcal{U}\), where \(\mathcal{U}=\mathcal{T}\cap\mathcal{P}_{n}\) and \(\mathcal{T}\) is a linear space with a parametrization σ(t) such that \(\operatorname{vec}(\sigma(t))=Ut\), and D=U T U.

The structured geometric mean \(G_{\mathcal{U}}\) is defined as the set of minimizers of the function f(X;A 1,…,A m ) over \(\mathcal{U}\). These minimizers must be sought among the stationary points of the function f, that is, among the solutions to the vector equation (3.6).

Therefore, a way to design algorithms for computing structured means \(G_{\mathcal{U}}\) is to apply numerical techniques to solve the vector equation (3.6). We consider a preconditioned Richardson-like iteration constructed in the spirit of [11]. Let V(X) be a nonsingular and sufficiently differentiable matrix function and define

$$ \begin{aligned} & \varphi(t)=t-\theta S(t),\qquad S(t)=V\bigl(\sigma(t)\bigr)^{-1}U^T \operatorname{vec} \bigl(\varGamma\bigl(\sigma(t)\bigr)\bigr), \\ & t_{\nu+1}=\varphi(t_\nu),\quad \nu=0,1,\ldots, \end{aligned} $$
(4.1)

where θ is a parameter introduced to enhance convergence, V(σ(t)) is a preconditioner and t 0 is a given vector such that σ(t 0) is positive. Observe that the fixed points of φ(t) are the solutions of the vector equation (3.6) and if convergent, the sequence t ν converges to a solution of the vector equation (3.6).

In the following, given a matrix function f(X), where X=(x i,j ) and f(X) are n×n matrices, we denote by J f (G) the n 2×n 2 Jacobian matrix of \(\operatorname{vec}(f(X))\) with respect to the variable \(\operatorname{vec}(X)\) computed at X=G, similarly we denote J fσ (t G ) the n 2×q Jacobian of the composed function \(\operatorname{vec}(f(\sigma(t)))\) with respect to the variables (t 1,…,t q ) at t=t G . In this notation, the function in the subscript as well as the variable between parentheses specify if the derivatives are taken w.r.t. the matrix variable X or the vector variable t.

Observe that if V(σ(t)) is chosen as the Jacobian of \(U^{T}\operatorname{vec}(\varGamma(\sigma(t)))\), then (4.1) coincides with Newton’s iteration.

If t G is a solution of (3.6) and if t ν is sufficiently near to t G , then

$$t_{\nu+1}-t_G=J_\varphi(t_G) (t_\nu-t_G)+O \bigl(\|t_\nu-t_G \| ^2 \bigr), $$

so that in order to study the local convergence of this sequence it is sufficient to estimate the spectral radius ρ or any induced norm of J φ (t G ) and determine θ in such a way that ρ(J φ (t G ))<1. Notice that the Jacobian of φ(t) at t=t G is given by IθK where K=J S (t G ) is the Jacobian of S(t) at t=t G . Therefore, if we can find a preconditioner V(t) such that K has real positive eigenvalues with minimum and maximum eigenvalues κ min and κ max respectively, then the choice θ=2/(κ min+κ max) insures local convergence and provides the minimum spectral radius of J φ (t G ) given by

$$\rho\bigl(J_\varphi(t_G)\bigr)=\frac{\kappa_{\max}-\kappa_{\min}}{\kappa _{\max}+\kappa_{\min}}= \frac{\mu-1}{\mu+1}<1,\quad\mu=\kappa_{\max}/\kappa_{\min}. $$

Moreover, any values \(\hat{\kappa}_{\min}\le\hat{\kappa}_{\max}\) such that \(\hat{\kappa}_{\min}\le \kappa_{\min}\le\kappa_{\max}\le\hat{\kappa}_{\max}\) can be used instead of κ min and κ max to determine a value \(\hat{\theta}=2/(\hat{\kappa}_{\min}+\hat{\kappa}_{\max})\) which insures convergence. Also notice that the closer μ is to 1 the faster is the convergence of the iteration.

Therefore our goal is to perform a spectral analysis of K and to find an upper bound to the ratio μ=κ max/κ min, assuming that all the eigenvalues of K are real positive. From the composition rule of derivatives one finds that

$$K=V\bigl(\sigma(t_G)\bigr)^{-1} U^TJ_\varGamma(G) U+J_{V(\sigma (t_G))^{-1}}\bigl(\sigma(t_G)\bigr) U^T \operatorname{vec}\bigl(\varGamma\bigl(\sigma(t_G)\bigr)\bigr) $$

and since \(U^{T}\operatorname{vec}(\varGamma(\sigma(t_{G})))=0\), it follows that

$$ K=V\bigl(\sigma(t_G)\bigr)^{-1}U^TJ_\varGamma(G) U. $$
(4.2)

To evaluate J Γ (G), we recall that \(\varGamma(X)=\sum_{i=1}^{m} X^{-1}\log(XA_{i}^{-1})\), so that it is sufficient to determine the formal expression of J ψ (G) for ψ(G,A)=G −1log(GA −1) for a generic A and then to write \(J_{\varGamma}(G)=\sum_{i=1}^{m}J_{\psi(G,A_{i})}(G)\). In order to evaluate J ψ (G), we rely on the definition of Fréchet derivative of a matrix function f(X) at X in the direction E

$$Df_X[E]=\lim_{t\to0}\frac{f(X+tE)-f(X)}{t}=\frac{d}{dt}\bigg\vert_{t=0}f(X+tE). $$

In fact, the n 2×n 2 Jacobian matrix J f (X) of the vector function \(\operatorname{vec}\circ f \circ \operatorname{vec}^{-1}\) at \(\operatorname{vec}(X)\) is related to the Fréchet derivative by the equation

$$ \operatorname{vec}\bigl(Df_X[E] \bigr)=J_f(X)\operatorname{vec}(E). $$
(4.3)

We recall also the following properties of the Fréchet derivative [22] where f,g are given matrix functions and φ(X)=X −1:

$$ \begin{aligned} &D(fg)_X[E]=Df_X[E]g(X)+f(X)Dg_X[E], \quad \hbox{product rule,} \\ &D(f\circ g)_X[E]=Df_{g(X)}\bigl[Dg_X[E] \bigr],\quad \hbox{chain rule,} \\ &D\varphi_{X}[E]=-X^{-1}EX^{-1},\quad \hbox{inversion.} \end{aligned} $$
(4.4)

For the derivative of the exponential function we have (see [22, Eq. 10.17a])

$$J_{\exp}(Y)=(I\otimes\exp Y)\beta \bigl(Y^T\otimes I-I \otimes Y \bigr),\quad\beta(z)=\bigl(e^z-1\bigr)/z. $$

Therefore, since J log(X)=J exp(Y)−1 for Y=logX, we find that

$$ J_{\log}(X)=\gamma \bigl(\log \bigl(X^T \bigr) \otimes I-I\otimes \log X \bigr) \bigl(I\otimes X^{-1} \bigr),\quad \gamma(z)=z/\bigl(e^z-1\bigr). $$
(4.5)

We are now ready to provide an explicit expression of the Fréchet derivative of the function ψ(X,A)=X −1log(XA −1) and of the Jacobian J ψ(X,A)(X).

Lemma 4.1

Let ψ(X)=X −1log(XA −1). Assume that A,X are positive. For the matrix J ψ (X) such that \(\operatorname{vec} (D\psi_{X}[E] )=J_{\psi}(X)\operatorname {vec}(E)\) we have

$$\begin{aligned} &J_{\psi}(X)=-X^{-1}\log \bigl(XA^{-1} \bigr)\otimes X^{-1} +\bigl(A^{-1}\otimes X^{-1}\bigr)\gamma(W) \bigl(I\otimes AX^{-1}\bigr), \\ &W=\log \bigl(XA^{-1} \bigr)\otimes I-I\otimes\log \bigl(XA^{-1} \bigr), \end{aligned}$$

with γ(z)=z/(e z−1).

Proof

Since h(X):=log(XA −1) is the composition of f(X)=log(X) and g(X)=XA −1, we get by (4.4)

$$Dh_X[E]=D{\log}_{XA^{-1}}\bigl[EA^{-1}\bigr]. $$

As ψ(X) is the product of f(X)=X −1 and h(X), (4.4) gives us

$$D{\psi}_X[E]=-X^{-1}EX^{-1}\log \bigl(XA^{-1} \bigr)+X^{-1}Dh_X[E]. $$

Combining the latter two equations yields

$$D{\psi}_X[E]=-X^{-1}EX^{-1}\log \bigl(XA^{-1} \bigr)+X^{-1}D{\log }_{XA^{-1}} \bigl[EA^{-1}\bigr]. $$

By using (4.3) and (1.6) we find that the matrix J ψ (X) representing X is given by

$$J_{\psi}(X)=- \bigl(X^{-1}\log \bigl(XA^{-1} \bigr) \bigr)^T\otimes X^{-1}+ \bigl(I\otimes X^{-1} \bigr)J_{\log} \bigl(XA^{-1} \bigr) \bigl(A^{-T} \otimes I \bigr). $$

Replacing (4.5) in the equation above and using the fact that A=A T,X=X T yields

$$\begin{aligned} J_{\psi}(X) =&-\log \bigl(A^{-1}X \bigr)X^{-1} \otimes X^{-1} \\ &{}+\bigl(I\otimes X^{-1}\bigr)\gamma \bigl(\log \bigl(A^{-1}X \bigr)\otimes I-I\otimes\log \bigl(XA^{-1} \bigr) \bigr) \bigl(A^{-1}\otimes AX^{-1} \bigr). \end{aligned}$$

Using the fact that Wlog(V)W −1=log(WVW −1), the first term can be written as −X −1log(XA −1)⊗X −1. The second term can be written as (IX −1)(A −1I)γ(log(XA −1)⊗II⊗log(XA −1))(IAX −1), which completes the proof. □

Recall that \(\varGamma(X)=\sum_{i=1}^{m}\psi(X,A_{i})\) and \(G^{-1}\sum_{i=1}^{m}\log(GA_{i}^{-1})=0\), for G=σ(t G ). Then by Lemma 4.1, we obtain the following formula for the Jacobian J Γ (σ(t)):

$$\begin{aligned} &J_\varGamma(G)= \bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr),\qquad H=\sum_{i=1}^mH_i, \\ &H_i=\bigl(A_i^{-1}\otimes I\bigr)\gamma \bigl(\log\bigl(GA_i^{-1}\bigr)\otimes I-I\otimes\log \bigl(GA_i^{-1}\bigr) \bigr) (I\otimes A_i). \end{aligned}$$

Moreover, by using the properties of the Kronecker product and the following relation log(GA −1)=A 1/2log(A −1/2 GA −1/2)A −1/2, we can write

$$\begin{aligned} & H_i= \bigl(A_i^{-1/2}\otimes A_i^{1/2}\bigr) \gamma(\log M_i\otimes I-I \otimes\log M_i) \bigl(A_i^{-1/2}\otimes A_i^{1/2}\bigr), \\ &M_i=A_i^{-1/2}GA_i^{-1/2}. \end{aligned}$$

From this expression it turns out that H i is positive, and from (4.2) we find that J S (t G ) is the product of the matrices V(σ(t G ))−1 and \(U^{T}(I\otimes G^{-1})\sum_{i=1}^{m} H_{i} (I\otimes G^{-1})U\), which is a positive matrix.

Thus we may conclude with the following

Theorem 4.1

The Jacobian K of the function S(t) in (4.2) at σ(t G )=G is given by

$$\begin{aligned} &K=V^{-1}U^T \bigl(I\otimes G^{-1} \bigr) H \bigl(I\otimes G^{-1} \bigr)U, \\ &H=\sum_{i=1}^m H_i,\quad H_i= \bigl(A_i^{-1/2}\otimes A_i^{1/2} \bigr) \gamma (\log M_i\otimes I-I \otimes\log M_i ) \bigl(A_i^{-1/2}\otimes A_i^{1/2} \bigr), \\ &M_i=A_i^{-1/2}GA_i^{-1/2}, \\ &\gamma(z)=z/\bigl(e^z-1\bigr). \end{aligned}$$

Moreover, the eigenvalues of K are the solutions of the equation

$$\det \bigl(\kappa V - U^T\bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr) U \bigr)=0. $$

4.2 An elementary preconditioner

The simplest choice for the preconditioner V(t) in (4.1) is V(t)=U T U=D. This corresponds to projecting the gradient of the function f(X,A 1,…,A p ) on the set \(\mathcal{U}\) according to the Euclidean scalar product. The problem det(κIK)=0 turns into the generalized q-dimensional symmetric eigenvalue problem

$$\det \bigl(U^T \bigl(\kappa I-\bigl(I\otimes G^{-1}\bigr)H \bigl(I\otimes G^{-1}\bigr) \bigr) U \bigr)=0. $$

This problem is the projection on the space spanned by the columns of U of the problem det(νI−(IG −1)H(IG −1))=0 which has real positive solutions.

Now we recall the following result, valid for general positive matrices A,B, which relates the generalized eigenvalues of the pair (A,B) to the ones of the projected pair (U T AU,U T BU).

Lemma 4.2

Let A,B be positive n×n matrices and U an n×m matrix. Then the generalized eigenvalues of the pair (U T AU,U T BU), which solve the equation det(U T(AκB)U)=0, are real positive and lie in between the maximum and minimum eigenvalues λ of the pair (A,B), such that det(AλB)=0. Moreover, the extreme eigenvalues λ min,λ max of the pair (A,B) are such that α min/β maxλ minλ maxα max/β min, where α min,α max,β min,β max are the minimum and maximum eigenvalues of the matrices A and B, respectively.

Proof

The condition det(λBA)=0 is equivalent to det(λIB −1/2 AB −1/2)=0, which has real positive solutions since B −1/2 AB −1/2 is positive. The remaining part of the lemma follows from the fact that maximum and minimum eigenvalues of the larger and smaller problems coincide with maximum and minimum value of the Rayleigh quotient x T Ax/x T Bx for \(x\in\mathbb{R}^{n}\setminus \{0\}\), and for \(x\in\operatorname{span}(U)\), respectively. □

A first consequence of the above lemma is that the extreme eigenvalues κ min and κ max of K are in between the maximum and the minimum eigenvalue of the n 2-dimensional symmetric matrix Y=(IG −1)H(IG −1), so that the ratio μ between the maximum and minimum eigenvalue of K is less than or equal to the condition number μ(Y) of the symmetric matrix Y. Moreover since \(Y=\sum_{i=1}^{m} Y_{i}\) with

$$\begin{aligned} Y_i =&\bigl(A_i^{-1/2}\otimes A_i^{-1/2}\bigr) \bigl(I\otimes M_i^{-1} \bigr) \gamma(\log M_i\otimes I-I\otimes\log M_i) \\ &\times\bigl(I\otimes M_i^{-1}\bigr) \bigl(A_i^{-1/2}\otimes A_i^{-1/2} \bigr), \end{aligned}$$

one finds that \(\widehat{k}_{\min}:=\sum_{i=1}^{m}\lambda_{\min}^{(i)}\le\kappa_{\min}\) and \(\widehat{k}_{\max}:=\sum_{i=1}^{m}\lambda_{\max}^{(i)}\ge\kappa_{\max}\), where \(\lambda_{\min}^{(i)}\) and \(\lambda_{\max}^{(i)}\) are the minimum and the maximum eigenvalues of Y i . Moreover, from Lemma 4.2 and from the expression above for Y i it follows that \(\lambda^{(i)}_{\min}\ge\gamma^{(i)}_{\min}/(\alpha_{\max}^{(i)})^{2}\), \(\lambda^{(i)}_{\max}\le\gamma^{(i)}_{\max}/(\alpha_{\min}^{(i)})^{2}\), where \(\alpha_{\min}^{(i)}\), \(\alpha^{(i)}_{\max}\) are the minimum and the maximum eigenvalues of A i , respectively, while \(\gamma^{(i)}_{\min}\) and \(\gamma^{(i)}_{\max}\) are the minimum and maximum eigenvalues of \((I\otimes M_{i}^{-1}) \gamma(\log M_{i}\otimes I-I\otimes\log M_{i}) (I\otimes M_{i}^{-1})\), respectively.

From the properties of the matrix function γ(⋅) and from the properties of the Kronecker product one finds that the eigenvalues of the latter matrix can be explicitly given in terms of the eigenvalues \(\nu^{(i)}_{r}\) of the matrix M i . In fact, they coincide with \(\frac{1}{({\nu^{(i)}_{s}})^{2}}(\log t^{(i)}_{r,s})/(t^{(i)}_{r,s}-1)\) where \(t^{(i)}_{r,s}=\frac{\nu^{(i)}_{r}}{\nu^{(i)}_{s}}\).

Since the function (logt)/(t−1) is monotonically decreasing, its minimum and maximum are

$$\begin{aligned} \eta^{(i)}_{\min}&=\bigl(\log\mu^{(i)}\bigr)/\bigl( \mu^{(i)}-1\bigr), \\ \eta^{(i)}_{\max}&=\log\bigl(1/\mu^{(i)}\bigr)/ \bigl(1/\mu^{(i)}-1\bigr)=\mu ^{(i)}\bigl(\log\mu^{(i)} \bigr)/\bigl(\mu^{(i)}-1\bigr), \end{aligned}$$

for μ (i)=μ(M i ) the spectral condition number of M i . Additionally, taking the factor (ν (i))−2 into consideration gives

$$\begin{aligned} \gamma^{(i)}_{\min}&\geq\eta^{(i)}_{\min} \bigl(\nu_{\max }^{(i)}\bigr)^{-2}, \\ \gamma^{(i)}_{\max}&\leq\eta^{(i)}_{\max} \bigl(\nu^{(i)}_{\min }\bigr)^{-2} \leq \mu^{(i)} \bigl(\nu^{(i)}_{\min}\bigr)^{-2}, \end{aligned}$$

where \(\nu_{\min}^{(i)}\) and \(\nu_{\max}^{(i)}\) represent respectively the minimum and maximimum eigenvalue of M i .

Therefore, we may conclude that the eigenvalues of K are bounded by \(\widetilde{\kappa}_{\min}:=\sum_{i=1}^{m} \eta^{(i)}_{\min}/(\nu ^{(i)}_{\max}\alpha^{(i)}_{\max})^{2}\) and \(\widetilde{\kappa}_{\max}:=\sum_{i=1}^{m} \eta^{(i)}_{\max}/(\nu ^{(i)}_{\min}\alpha^{(i)}_{\min})^{2}\).

Observe that this bound gets worse when either some matrix is ill-conditioned or if some matrix \(A_{i}^{-1/2}GA_{i}^{-1/2}\) is ill-conditioned. The latter case cannot occur if the matrices A i do not differ much from G. The dependence of this bound on the conditioning of A i makes this algorithm very inefficient as long as some A i is ill-conditioned. This drawback is overcome in the next section, where we design a more effective preconditioner.

4.3 A preconditioner based on a differential geometric viewpoint

The Karcher mean for positive matrices inherits a beautiful interpretation in terms of differential geometry. It can be considered as the center of mass for a well chosen inner product on the manifold of positive matrices. In this section and in Sect. 5 we consider two approaches inspired by this idea. For more information we refer to the overview in [25], and the articles [15, 20, 3638].

When considering a manifold optimization approach, the intersection \(\mathcal{U}\) of a linear space \(\mathcal{T}\) with the manifold of positive matrices \(\mathcal{P}_{n}\) can be viewed as a Riemannian submanifold of \(\mathcal{P}_{n}\) itself, which in turn is called the enveloping space. This entails that the inner product from this enveloping space is induced on the submanifold. An immediate consequence is that the gradient of the cost function for the submanifold is given by the orthogonal projection (with respect to the inner product) of the gradient for the enveloping space. Similar to the space of symmetric matrices, being the tangent space to the manifold of positive matrices, the intersection \(\mathcal{V}\) of the linear space \(\mathcal{T}\) with the space of symmetric matrices is the tangent space to \(\mathcal{U}\).

First consider the manifold of positive matrices endowed with the Euclidean inner product \(g_{X} ( A,B ) = \operatorname{tr}(A B)\), with A and B symmetric, and X a positive matrix. Note that even though this inner product g X is independent of X, the subscript notation is kept for consistency. In this case, the orthogonal projection of a symmetric matrix A onto \(\mathcal{T}\) gives a matrix T, with

$$ \operatorname{vec}(T)= U \bigl(U^T U \bigr)^{-1} U^T \operatorname{vec}(A), $$

or \(\operatorname{vec}(T)=Ut\), with

$$ t= \bigl(U^T U \bigr)^{-1} U^T \operatorname{vec}(A). $$
(4.6)

The expression for the gradient of the Karcher cost function, corresponding to the Euclidean inner product, is known for the manifold of positive matrices and is given by

$$\begin{aligned} \operatorname{grad}_{e} f (X;A_1, \ldots, A_m) =& 2 X^{-1} \sum_{i=1}^m \log \bigl(X A_i^{-1} \bigr) \\ =& 2 X^{-1/2} \sum_{i=1}^m \log \bigl(X^{1/2} A_i^{-1} X^{1/2} \bigr) X^{-1/2}. \end{aligned}$$
(4.7)

The gradient naturally defines the direction of steepest ascent. Nevertheless, the gradient lies in the tangent space, and to build an algorithm from this, a practical way is to follow the gradient and then go back to the manifold through a suitable function, called retraction. The precise definition of a retraction, together with general theoretical assumptions it should satisfy, can be found in [1]. Figure 2(a) graphically illustrates the concept of a retraction, where a vector ξ X in the tangent space \(T_{X}\mathcal{P}_{n}\) of the positive matrices is retracted to a point R X (ξ X ) residing on the manifold \(\mathcal{P}_{n}\). On a manifold, the classical steepest descent algorithm is graphically depicted in Fig. 2(b). The thin red lines depict the contour lines, the blue arrows the gradients, and the green curves the retractions to the manifold.

Fig. 2
figure 2

Graphical representation of a retraction and steepest descent flow

Observe that for \(\mathcal{P}_{n}\) immersed in the set of symmetric matrices, the tangent space at a point is the whole set of symmetric matrices. So one can consider the basic retraction R X (A)=X+A for a sufficiently small symmetric matrix A.

Entering now the gradient (4.7) in projection (4.6) and applying a gradient descent method with the basic retraction R X (A)=X+A, we arrive exactly at the Richardson-like algorithm for finding the fixed points of function (4.1).

However, since the function f to be minimized is defined through the distance (1.2), it is more natural to consider the manifold of positive matrices endowed with the inner product \(g_{X} ( A,B ) = \operatorname{tr} (A X^{-1} B X^{-1} )\), with A, B and X as before. In this case, the gradient for the enveloping space is known to be

$$\begin{aligned} \operatorname{grad}_{n} f (X;A_1, \ldots, A_m) =& 2 X \sum_{i=1}^m \log \bigl( A_i^{-1} X \bigr) \\ =& 2 X^{1/2} \sum_{i=1}^m \log \bigl(X^{1/2} A_i^{-1} X^{1/2} \bigr) X^{1/2}. \end{aligned}$$

Note the difference with (4.7).

The orthogonal projection T onto the intersection \(\mathcal{V}\) (of \(\mathcal{T}\) and the space of symmetric matrices) of this gradient, with respect to the Riemannian scalar product, can be found as the solution of the equations

$$\begin{aligned} &\operatorname{grad}_{n} f (X) = T + S , \\ &g_X ( S,K ) = \operatorname{tr} \bigl(S X^{-1} K X^{-1} \bigr) = 0, \quad \mbox{for every } K \in\mathcal{V}. \end{aligned}$$

Writing again \(\operatorname{vec}(T)=Ut\), we find in parameter space

$$ t= \bigl( U^T \bigl( X^{-1} \otimes X^{-1} \bigr) U \bigr)^{-1} U^T \bigl( X^{-1} \otimes X^{-1} \bigr) \operatorname {vec}\bigl( \operatorname{grad}_{n} f (X)\bigr). $$
(4.8)

The factor U T(X −1X −1)U is recurring and is abbreviated as D X , where the subscript points to the intrinsic variable X. Observe that this Riemannian orthogonal projection can be seen as a Euclidean oblique projection where the two bases of the subspace are the columns of U and (X −1X −1)U, respectively.

Using this expression, it is possible to define another gradient descent method where we are now searching the fixed points of the function

$$ \varphi(t)=t - \theta D_{\sigma(t)}^{-1} U^T \bigl( \sigma(t)^{-1} \otimes\sigma(t)^{-1} \bigr) \operatorname{vec} \Biggl(\sigma(t) \sum_{i=1}^m \log \bigl( A_i^{-1} \sigma(t) \bigr) \Biggr). $$
(4.9)

Relying on (1.6) to incorporate the Kronecker product into the vectorization, we find that \((\sigma^{-1}\otimes\sigma^{-1})\operatorname{vec}(\sigma\sum_{i=1}^{m}\log(A_{i}^{-1}\sigma))= \operatorname{vec}(\sum_{i=1}^{m}\log(A_{i}^{-1}\sigma)\sigma^{-1})\). Applying a property of the matrix logarithm we may rewrite the latter expression as \(\operatorname{vec}(\sigma^{-1}\sum_{i=1}^{m}\log(\sigma A_{i}^{-1}))\). This way, (4.9) takes the form of (4.1) with

$$V( \sigma(t) ) = U^T ( \sigma(t)^{-1} \otimes \sigma(t)^{-1})U $$

To analyze the convergence of (4.1) with this choice for V(σ(t)), we have to analyze the eigenvalues of the Jacobian K=J S (t G ) of S(t) in (4.1) where the equation det(κIK)=0 takes the form of the following generalized eigenvalue problem

$$ \det \bigl(U^T \bigl(\kappa\bigl(G^{-1}\otimes G^{-1}\bigr)- \bigl(I\otimes G^{-1}\bigr) H\bigl(I\otimes G^{-1}\bigr) \bigr)U \bigr)=0. $$
(4.10)

Since the two matrices in (4.10) are positive, in view of Lemma 4.2, the solutions of this generalized eigenvalue problem are real positive and are located in between the maximum and the minimum solution of the larger problem

$$\det \bigl(\lambda\bigl(G^{-1}\otimes G^{-1}\bigr)- \bigl(I \otimes G^{-1}\bigr) H\bigl(I\otimes G^{-1}\bigr) \bigr)=0, $$

which in turn can be rewritten as a standard eigenvalue problem

$$\det \bigl(\lambda I -\bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigl(I\otimes G^{-1}\bigr) H \bigl(I\otimes G^{-1}\bigr) \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigr)=0. $$

Since \(H=\sum_{i=1}^{m} H_{i}\), and the matrices H i are real symmetric, the eigenvalues of this problem are located in between the sum of the minimum and the sum of the maximum eigenvalues of each subproblem

$$ \det \bigl(\lambda I - \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigl(I\otimes G^{-1}\bigr) H_i \bigl(I\otimes G^{-1}\bigr) \bigl(G^{1/2}\otimes G^{1/2}\bigr) \bigr)=0, $$
(4.11)

that is det(λ(G −1G)−H i )=0, or equivalently det(λI−(GIH i (IG −1))=0. The matrix in the latter expression is similar to \((A_{i}^{-1/2}\otimes A_{i}^{-1/2}) (G\otimes I)H_{i}(I\otimes G^{-1}) (A_{i}^{1/2}\otimes A_{i}^{1/2})\), which, using the expression of H i provided in Theorem 4.1, can be written as

$$(M_i\otimes I) \gamma( \log M_i\otimes I-I\otimes\log M_i) \bigl(I\otimes M_i^{-1}\bigr). $$

This way, the eigenvalues of (4.11) can be explicitly given in terms of the eigenvalues \(\nu^{(i)}_{r}\) of the matrix M i . In fact, they coincide with the \(t^{(i)}_{r,s}(\log t^{(i)}_{r,s})/(t^{(i)}_{r,s}-1)\) where \(t^{(i)}_{r,s}=\frac{\nu^{(i)}_{r}}{\nu^{(i)}_{s}}\).

Since the function t(logt)/(t−1) is monotone, for the minimum and maximum solution to (4.11) we have

$$\begin{aligned} \eta^{(i)}_{\min}&= \bigl(1/\mu^{(i)}\bigr) \log \bigl(1/\mu^{(i)}\bigr)/\bigl(1/\mu ^{(i)}-1\bigr)=\bigl(\log \mu^{(i)}\bigr)/\bigl(\mu^{(i)}-1\bigr), \\ \eta^{(i)}_{\max}&=\mu^{(i)}\bigl(\log \mu^{(i)}\bigr)/\bigl(\mu^{(i)}-1\bigr), \end{aligned}$$

respectively, for μ (i)=μ(M i ) the spectral condition number of M i . Therefore, we may conclude that the eigenvalues of K are in between \(\sum_{i=1}^{m} \eta^{(i)}_{\min}\) and \(\sum_{i=1}^{m} \eta ^{(i)}_{\max}\). This way, we find for the optimal value of θ and for the optimal spectral radius the estimates

$$\begin{aligned} &\theta=\frac{2}{\sum_{i=1}^m \frac{\mu^{(i)}+1}{\mu^{(i)}-1}\log \mu^{(i)}}, \\ & \rho=\frac{\sum_{i=1}^m \log\mu^{(i)}}{\sum_{i=1}^m \frac{\mu ^{(i)}+1}{\mu^{(i)}-1}\log\mu^{(i)}}. \end{aligned} $$

It is interesting to point out that in this case the convergence speed is related neither to the condition number of the geometric mean G nor to those of the matrices A i but is related only to the relative distances of G from each A i measured by the quantities μ (i)=μ(M i ), \(M_{i}=A_{i}^{-1/2}GA_{i}^{-1/2}\). The closer they are to 1 the faster is the convergence. Therefore, if the matrices to average are not too far from each other, so that the quantities μ(M i ) are close to 1, then the optimal value of θ is close to 1/m and a very fast convergence is expected. This analysis is confirmed by the numerical experiments.

4.4 The case of Toeplitz matrices

From the computational point of view, at each step of the iteration (4.1) one has to compute \(U^{T}\operatorname{vec}(\varGamma (\sigma(t)))\) and then solve a linear system with the matrix V(σ(t)). The former computation, based on (3.6), requires O(mn 3) arithmetic operations (ops), while the cost of the latter depends on the structure of V(σ(t)).

In this section we examine the case where \(\mathcal{U}\) is the class of symmetric Toeplitz matrices and where σ(t) associates t with the Toeplitz matrix having as first column t. We describe a way to make the algorithm of Sect. 4.3 more efficient.

Indeed, for the iteration analyzed in Sect. 4.2, V is the diagonal matrix with diagonal entries (n,2n−2,…,2) and the cost of solving a system with matrix V amounts to n divisions.

The iteration examined in Sect. 4.3 has a higher convergence speed but at each step an n×n system with V=U T(X −1X −1)U must be solved, where X is a symmetric positive definite Toeplitz matrix.

We split the computation in two steps. In the first, the n 2 entries of V are computed, in the second step a standard O(n 3) ops linear system solver is used. Concerning the first step we discuss two approaches.

In both approaches the inverse of the Toeplitz matrix X needs to be computed, which can be done efficiently using the Gohberg Semencul formula [13]. Here, vectors v 1, v 2, v 3, v 4 are determined such that X −1=L(v 1)L(v 2)TL(v 3)L(v 4)T, where L(v) is the lower triangular Toeplitz matrix whose first column is v. From these, the n 2 entries of X −1 can be found. The overall cost is O(n 2) ops.

  1. 1.

    As a first attempt, the entries of V are computed in a straightforward manner using the entries of X −1:

    $$V=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} \gamma_{1,1} & 2\gamma_{1,2} & \cdots& 2\gamma_{1,n} \\ 2\gamma_{1,2} & 2\gamma_{2,2} & \cdots& 2\gamma_{2,n} \\ \vdots & \vdots& \ddots& \vdots\\ 2\gamma_{1,n} & 2\gamma_{2,n} & \cdots& 2\gamma_{n,n} \end{array} \right ], $$

    where

    $$\begin{aligned} & \gamma_{1,j}=\sum_{i=1}^n \sum _{k=1}^{n-j+1} \bigl(X^{-1} \bigr)_{i,k} \bigl(X^{-1}\bigr)_{i,k+j-1}, \\ & \gamma_{j,p}= \sum_{i=1}^{n-j+1} \sum_{k=1}^{n-p+1} \bigl( \bigl(X^{-1} \bigr)_{i,k} \bigl(X^{-1}\bigr)_{i+j-1,k+p-1} + \bigl(X^{-1}\bigr)_{i,k+p-1} \bigl(X^{-1} \bigr)_{i+j-1,k} \bigr). \end{aligned}$$

    The cost of this approach in terms of arithmetic operations is of the order O(n 4).

  2. 2.

    In the second approach, we show that the cost of this computation can be kept at the level of O(n 3logn) ops by combining the Gohberg Semencul formula and the FFT. For a given i, the product vector w i =(X −1X −1)Ue i , where e i is the ith vector of the canonical basis, is such that \(w_{i}=\operatorname{vec}(X^{-1}E_{i}X^{-1})\), with E i being the symmetric Toeplitz matrix whose first column is e i . Therefore, compute first the columns of E i X −1 by performing O(n 2) additions, and then multiply X −1 by these columns, stacking the results to obtain w i . This computation is performed in O(n 2logn) operations for each i by using the Goghberg Semencul formula, since the multiplication of a lower triangular Toeplitz matrix and a vector can be performed in O(nlogn) operations by means of the FFT [13]. Therefore the overall computation of this stage for i=1,…,n is O(n 3logn) ops. Finally, compute for any i the vector U T w i for the cost of O(n 2) additions.

The performance of these methods will be compared in Sect. 6.

5 Kähler metric mean for Toeplitz matrices

The Karcher mean of positive definite matrices has the specific interpretation of being the barycenter of the given matrices for the natural metric (1.2) on this manifold. Hence there are in a certain sense two possible generalizations. On the one hand, try to generalize the geometric mean concept, or, on the other hand, try to generalize the barycenter concept. Previously we focused on an extension of the geometric mean. Hereafter we focus on the positive definite Toeplitz matrix manifold itself, denoted by \(\mathcal{T}_{n}\), and consider a barycenter in this case. This mean cannot be called a geometric mean in the sense of satisfying all required properties, but through its intuitive definition, many desirable properties could arise.

The concept of a barycenter is not restricted to the specific metric used to define the Karcher mean. For example, when the set \(\mathcal{T}_{n}\) is endowed with the classical Euclidean inner product, the resulting barycenter is nothing else than the arithmetic mean. Using a probabilty argument, in [3, 4] a metric on \(\mathcal{T}_{n}\) is introduced, called the Kähler metric. This metric results in a complete, simply connected manifold with non-positive sectional curvature everywhere, or a Cartan–Hadamard manifold. Thus, by [17, 27], existence and uniqueness are guaranteed for the barycenter with respect to this metric.

We will recall some known facts about the Kähler metric, and then we will give an explicit formula for the barycenter in the real case and a numerical procedure to compute the barycenter in the complex case.

To construct the Kähler metric, a Toeplitz matrix is first transformed to an n-tuple (P 0,μ 1,…,μ n−1) in \(\mathbb{R}_{+}^{*}\times\mathbb{D}^{n-1}\), with \(\mathbb{R}_{+}^{*}\) the set of strictly positive real numbers and \(\mathbb{D}\) the set of complex numbers of modulus less than one. This transformation, denoted as ζ(T)=[P T ,μ T,1,…,μ T,n−1]T, is performed as follows:

$$P_T = t_0,\qquad \mu_{T,j} = (-1)^j \frac{\det(S_j)}{\det(R_j)}, $$

with t 0 the main diagonal element of T, R j the principal submatrix of size j of T (the upper left j×j submatrix) and S j obtained by shifting R j down one row, or equivalently, by removing the first row and last column of R j+1 (the inverse transformation can be found in [42]). In what follows, we use this one-to-one relation between the Toeplitz matrices and the corresponding n-tuple, and when clear by the context, we will neglect the distinction and identify one with the other.

For X and Y being the transformations of two positive Toeplitz matrices X=[P X ,μ X,1,…,μ X,n−1]T and Y=[P Y ,μ Y,1,…,μ Y,n−1]T, the metric is given by

$$\begin{aligned} &\operatorname{d}(X,Y) = \Biggl( n \sigma(P_X,P_Y)^2 + \sum_{j=1}^{n-1} (n-j)\tau( \mu_{X,j},\mu_{Y,j})^2 \Biggr)^{1/2}, \\ &\sigma(P_X,P_Y) = \biggl| \log \biggl(\frac{P_Y}{P_X} \biggr) \biggr|, \qquad \tau(\mu_{X,j},\mu_{Y,j}) = \mathrm{atanh} \biggl( \biggl|\frac{\mu _{Y,j}-\mu_{X,j}}{1-\mu_{X,j}\mu_{Y,j}} \biggr| \biggr), \end{aligned}$$

where \(\operatorname{atanh}(z)=\frac{1}{2}\log (\frac {1+z}{1-z} )\).

The barycenter of the positive Toeplitz matrices T i , for i=1,…,m, with respect to the Kähler metric will be denoted by B(T 1,…,T m )=[P B ,μ B,1,…,μ B,n−1]T. It is obtained in this transformed space by minimizing the function

$$\begin{aligned} f(X) &= \sum_{i=1}^m \operatorname{d}^2(X,T_i) \end{aligned}$$

over \(\mathbb{R}_{+}^{*} \times\mathbb{D}^{n-1}\). Notice that the problem of minimizing f(X) can be decoupled into the problems of minimizing \(\varphi_{0}(x)=\sum_{i=1}^{m} \sigma(x,P_{T_{i}})^{2}\) over \(\mathbb{R}_{+}^{*}\), and the n−1 scalar functions

$$\varphi_j(z)=\sum_{i=1}^m \tau(z,\mu_{T_i,j})^2,\quad j=1,\ldots,n-1 $$

over \(\mathbb{D}\). The minimum of φ 0(x) is easily obtained as \(P_{B}=(P_{T_{1}}\cdots P_{T_{m}})^{1/m}\) by solving the equation ∇φ 0(x)=0. The minimum of φ j (z) is nothing else than the barycenter of \(\mu _{T_{1},j},\ldots,\mu_{T_{m},j}\) with respect to the customary Poincaré metric on the unit disk and is the point where the gradient

$$ \nabla\varphi_j(z)= 2\bigl(|z|^2 - 1 \bigr) \sum_{i=1}^m\operatorname {sign}(c_{i,j})\operatorname{atanh}\bigl(|c_{i,j}|\bigr),\quad c_{i,j}=\frac{\mu_{{T_i},j}-z}{1-\overline{z}\mu_{{T_i},j}}, $$
(5.1)

equals zero.

In the real case we are able to find an explicit expression for this barycenter as well, since \(\operatorname{sign}(c)\operatorname {atanh}(|c|)=\operatorname{atanh}(c)\) and after some manipulations we get

$$\mu_{X,j}=\mathcal{C} \bigl( \bigl(\mathcal{C}(\mu_{T_1,j}) \cdots \mathcal{C}(\mu_{T_m,j}) \bigr)^{1/m} \bigr), $$

where \(\mathcal{C}(z)=(1-z)/(1+z)\) is the Cayley transform.

In the complex case we were not able to find such an explicit formula but a quick numerical method can be devised using a gradient descent algorithm. We recall that the tangent space to the Poincaré disk can be identified with the complex plane and thus for a sufficiently small tangent vector \(v\in\mathbb{C}\), one can consider the retraction R z (v)=z+v, which captures the fact that the manifold is an open subset of the complex plane. The resulting algorithm to find the barycenter of \(\mu_{1},\ldots,\mu_{n}\in\mathbb{C}\) is given by the iteration

$$ \begin{aligned} &z_{k+1}=z_k+t_k v_k, \\ &v_k=\bigl(1-|z_k|^2\bigr)\sum _{i=1}^n \operatorname{sign}(c_{i,k}) \operatorname{atanh}(|c_{i,k}|), \\ &c_{i,k}=\frac{\mu_i-z_k}{1-\overline{z}_k\mu_i}, \end{aligned} $$
(5.2)

for a suitable initial value z 0 and a sufficiently small steplength t k .

Another possibility is to consider the retraction

$$R_z(v)=\frac{z+e^{i \theta}+(z-e^{i \theta})e^{-s}}{ 1+\overline{z}e^{i \theta}+(1-\overline{z}e^{i \theta})e^{-s}},\quad \theta= \arg v, \ s= \frac{2|v|}{1-|z|^2}, $$

which corresponds to moving along the geodesics of the Poincaré disk. The corresponding gradient descent method is

$$z_{k+1}=R_{z_k}(t_k v_k), $$

with the same v k as (5.2).

5.1 Properties of the Kähler barycenter

Regarding the properties of this barycenter, it is easily seen that it is permutation invariant, repetition invariant and idempotent (this holds for any barycenter). Moreover, for any α>0, the transformed values of αT are [αP T ,μ T,1,…,μ T,m ]T and from the explicit expression of P B in the real case we get B(αT 1,T 2,…,T m )=α 1/m B(T 1,…,T m ), that is, homogeneity holds.

Unfortunately, this new barycenter does not possess other properties as shown by the following example.

Example 5.1

From the explicit expression for the mean in the real case we get a simple formula for the Kähler barycenter of two 2×2 real matrices

$$T_1=\left [ \begin{array}{c@{\quad}c} x_1 & y_1 \\ y_1 & x_1 \end{array} \right ],\qquad T_2=\left [ \begin{array}{c@{\quad}c} x_2 & y_2 \\ y_2 & x_2 \end{array} \right ], $$

namely

$$B(T_1,T_2)=\sqrt{x_1x_2} \left [ \begin{array}{c@{\quad}c} 1 & \frac{a-b}{a+b } \\ \frac{a-b}{a+b} & 1 \end{array} \right ],\quad \mbox{with } \left\{ \begin{array}{l} a=\sqrt{ (x_1+y_1)(x_2+y_2) } \\ b=\sqrt{(x_1-y_1)(x_2-y_2)} \end{array} \right.. $$

Now consider the following matrices

$$T_1=\left [ \begin{array}{c@{\quad}c} 2 & 1 \\ 1 & 2 \end{array} \right ],\qquad \widetilde{T}_1=\left [ \begin{array}{c@{\quad}c} 4 & -1 \\ -1 & 4 \end{array} \right ], \qquad T_2=\left [ \begin{array}{c@{\quad}c} 2 & -1 \\ -1 & 2 \end{array} \right ], $$

with \(\widetilde{T}_{1}\ge T_{1}\). By symbolic computation, one gets that

$$B(\widetilde{T}_1,T_2)=\left [ \begin{array}{c@{\quad}c} 2\sqrt{2} & \sqrt{2}(\sqrt{5}-3) \\ \sqrt{2}(\sqrt{5}-3) & 2\sqrt{2} \end{array} \right ]\not\ge B(T_1,T_2)=\left [ \begin{array}{c@{\quad}c} 2 & 0 \\ 0 & 2 \end{array} \right ], $$

in fact one eigenvalue of \(B(\widetilde{T}_{1},T_{2})-B(T_{1},T_{2})\) is \(\lambda=\sqrt{10}-2-\sqrt{2}<0\). Thus, we have proved that the Kähler barycenter is not monotonic. Moreover,

$$B(T_1,T_2)\neq(T_1T_2)^{1/2}= \left [ \begin{array}{c@{\quad}c} \sqrt{3} & 0 \\ 0 & \sqrt{3} \end{array} \right ], $$

and hence the Kähler barycenter does not coincide with the Karcher mean for circulant matrices. In particular, it is not a structured geometric mean as defined in Sect. 1.

Observe that in the previous example B(T 1,T 2) surprisingly coincides with the arithmetic mean of T 1 and T 2. It is not difficult to construct examples where it is not true that B(T 1,T 2)≤(T 1+T 2)/2 as it should be for a geometric mean.

6 Numerical experiments

In this section, the different algorithms proposed in Sects. 4.2 and 4.3 will be compared w.r.t. speed and accuracy. The numerical experiments are confined to Toeplitz matrices, because of applicational interest in computing their structured matrix mean [29]. These matrices are constructed randomly, but with chosen condition number, using the technique described in [18]. Performance, accuracy and computational distance are subjects of the forthcoming investigations. For clarity we remind the reader that the Richardson-iteration corresponds to a projection technique on a manifold, with the classical Euclidean inner product. For all algorithms, the stopping criteria is based on checking the relative size of the gradient and on comparing two consecutive iteration points.

It is worth pointing out that, in spite of the lack of the proof of uniqueness for structured geometric mean in the Toeplitz case, for any fixed set of data matrices used in our experiments, any initial value and any algorithm yielded always the same structured geometric mean. This suggests the conjecture that in the Toeplitz case there is a unique structured geometric mean.

We have also compared the structured geometric mean obtained by our algorithms with the Kähler metric mean, getting in most experiments a relative difference of the order 10−1, which indicates that these two means are relatively far from each other.

6.1 The projection methods

Performance

The performance of the projection methods explained in Sect. 4 can be compared by looking at both the number of iterations the methods require and the total amount of computational time they need.

In Fig. 3(a), the evolution of the gradient over the iterations is displayed for both techniques (and hence also the number of iterations). Using the projection method introduced in Sect. 4.3 gives a faster decrease of the gradient and results in fewer iteration steps. The number of iterations remains almost constant for this method as the size of the matrices increases. For the projection technique from Sect. 4.2 on the other hand, this number starts to increase when the matrix size grows.

Fig. 3
figure 3

Comparison of the projection methods for Toeplitz matrices. In the legends, Euclidean indicates the method of Sect. 4.2, Riemannian indicates the first approach described in Sect. 4.4, and Riem-FFT the second

However, comparing expression (4.6) and (4.8), it can be seen that the second one is computationally more expensive and hence the advantage of requiring fewer iterations could be nullified. Therefore, Fig. 3(b) displays the total computational time of both methods for varying sizes of the matrices (both approaches from Sect. 4.4 are shown). The two methods based on Sect. 4.3 maintain an advantage despite their larger computational cost per iteration. Note that for the largest matrix size the computational time of the Euclidean based method appears less than one of the other methods. However, this is caused by the increasing number of iterations required by this Euclidean method. Consequently, the maximum number of iterations is reached before convergence and the algorithms is terminated prematurely. Concerning the operation count in Sect. 4.4, the advantage of the method based on FFT starts to appear when the matrices become sufficiently large.

Accuracy

In order to analyze the accuracy of the projection methods, we implement a high precision version of the first algorithm in Sect. 4.4 using the vpa functionality of Matlab. The relative distance, based on the intrinsic distance (1.2), between this high precision computation and the result of the actual algorithms is shown in Fig. 4. For small condition numbers, the accuracy of all methods is similar in average, but as the condition of the matrices becomes worse, the accuracy of the projection method based on Euclidean geometry deteriorates much faster than that of the method based on the Riemannian geometry. This first method even fails to converge when the condition number of the matrices becomes significantly large. The accuracy of the two approaches in Sect. 4.4 is similar and deteriorates steadily as the condition numbers of the matrices increase.

Fig. 4
figure 4

Accuracy of the projection methods when compared to a high precision version. The mean of the samples is connected by a line. In the legends, Euclidean indicates the method of Sect. 4.2, Riemannian indicates the first approach described in Sect. 4.4, and Riem-FFT the second

7 Conclusions

In this article a generalization of the Karcher mean for positive definite matrices to structured positive definite matrices was proposed. Besides a theoretical investigation and adaptation of the desired properties of such a mean, algorithms were proposed. In the design of the algorithms, two trajectories were put forward, one relying mostly on linear algebra, and one based on differential geometry. A convergence analysis has been performed showing the superiority of the algorithm based on differential geometry. Numerical experiments compared the accuracy and speed of the various techniques and confirmed the theoretical analysis.

In the case of Toeplitz matrices, we have considered also the Kähler metric mean [4], whose properties have been investigated, providing an explicit expression in the real case and a quick algorithm in the complex case. For Toeplitz matrices, both the new structured geometric mean and the Kähler metric mean have not completely satisfying properties. In fact they are neither monotone, nor do they satisfy the arithmetic-geometric inequality. We wonder if it is possible to provide a definition of geometric mean for Toeplitz matrices which behaves well with respect to the ordering of positive matrices.