1 Introduction

Phylogenetic trees reflect biological species’ evolution. They are built from genetic variation over a set of taxa. Curiously, building them for the same set of taxa, from different genes, however, often result in fundamentally different trees, e.g. Rokas et al. (2003). This generates a call for statistics, for instance averaging over different trees while controlling their uncertainty. Also, this is a call for geometry, designing suitable spaces of trees that are both, biologically meaningful and numerically tractable.

A seminal model has been proposed twenty years ago by Billera et al. (2001), abbreviated as the BHV-model. It has the favorable property of being a Riemann stratified space of globally nonpositive curvature, thus admitting unique geodesics and unique Fréchet means. Additionally, since it is locally flat, an abundance of successful algorithms have been developed for their computation that suffer only from inherent combinatorial complexity, e.g. Owen (2011); Bačák (2014); Miller et al. (2015); Brown and Owen (2018).

While this model is mathematically intriguing, more recently new models have been developed with geometries more closely reflecting stochastic biological fundamentals of gene mutations, e.g. Moulton and Steel (2004); Shiers et al. (2016); Garba et al. (2020). In Garba et al. (2018), metrics for phylogenetic trees based on the information geometry of the two-state and four-state model were proposed (four states because gene entries are taken from one of the four nucleotide bases). This study was continued in Garba et al. (2020, 2021) and - as a further simplification - a continuous model has been proposed with moments matching those of the two-state model.

In this contribution, we briefly review the definition of our new wald space (cf. Garba et al. (2020)) and propose algorithms to compute geodesics and Fréchet means. On the one hand, the wald space is geometrically more challenging. It is a stratified space that is isometrically embedded in the space of positive symmetric \(N\times N\) matrices \(\mathcal {P}\) (where \(N\in \mathbb {N}\) is the number of taxa) equipped with the well known affine invariant geometry of globally nonpositive curvature – hence the need for algorithms as sophisticated as those of the BHV-space. On the other hand, we believe it is biologically more meaningful than the BHV-space. For example, in BHV-space the distance of two different trees with edge lengths becoming arbitrary large diverges to infinity. In wald space, such trees converge to the completely disconnected forest, a member of the wald space, along with other forests. Hence these two trees become more and more similar. Simulations and data analyses reveal advantage of wald space: degenerate trees seem to be less sticky (sticky means have degenerate limiting distributions) in wald space than in BHV-space, cf. Hotz et al. (2013); Huckemann et al. (2015); Barden et al. (2013, 2018), thus more easily allowing for statistical inference.

Wald space was first proposed at the Oberwolfach workshop 1804 (2018) in the black forest which is the Schwarzwald in German.

2 Wald Space

Let \(N\in \mathbb {N}\) denote the number of taxa. A phylogenetic forest \((F,\ell )\) is

  1. (i)

    a forest \(F = (V,E)\) with a finite number of vertices V, undirected edges E such that any two vertices \(u,v \in V\) are connected by at most one edge denoted by \(\{u,v\}\) and labeled vertices \(L = \{1,\dots ,N\} \subseteq V\), where \(v\in V\setminus L\) implies that \(\deg (v)\ge 3\),

  2. (ii)

    with a mapping \(\ell :E\rightarrow (0,\infty )\).

Two phylogenetic forests are equivalent, \((F_1,\ell _1)\sim (F_2,\ell _2)\), if their label sets agree \(L_1 = L = L_2\) and if there is a graph isomorphism \(f:V_1\rightarrow V_2\) such that

  1. (i)

    \(f(u) = u\) for all \(u\in L\), and

  2. (ii)

    \(\ell _1(\{u, v\}) = \ell _2(\{f(u), f(v)\})\) for all \(\{u, v\}\in E_1\).

Definition 1

Every equivalence class \(W=[F,\ell ]\) is called a wald and all equivalence classes form the wald space \({\mathcal {W}_{}}\), its geometric structure is defined further below. Disregarding the edge lengths map \(\ell \), every equivalence class of forests F with regards to (i) above, is a wald topology. For a given wald \(W=[F,\ell ]\), the grove of W is \({\mathcal {W}_{W}}\) which comprises all \(W'=[F',\ell '] \in {\mathcal {W}_{}}\) where \(F'\) and F have the same wald topology.

In the following, for any connected \(u,v\in V\), E(uv) is the set of edges along the unique path connecting u and v. For \(u=v\), we set any sum over E(uu) equal zero.

With this notation, the map \(\phi \) sending \(W=[F,\ell ]\) to the \(N\times N\) matrix with coordinate entry at \(u,v \in L\),

$$\begin{aligned} \big (\phi (W)\big )_{uv} = \big (\phi ([F,\ell ])\big )_{uv} := {\left\{ \begin{array}{ll} \exp \Big (-\sum _{e\in E(u,v)} \ell (e)\Big ),&{}\text {if } u \text { and } v \text { are connected},\\ 0,&{}\text {else}, \end{array}\right. } \end{aligned}$$
(1)

is well defined and maps \({\mathcal {W}_{}}\) injectively into the set of symmetric positive \(N \times N\) matrices \(\mathcal {P}\), cf. Garba et al. (2020).

Recall from Garba et al. (2020, 2021) that the affine invariant Riemannian metric on \(\mathcal {P}\) corresponds to the Fisher information geometry for zero-mean nondegenerate N-dimensional Gaussians induced by tree-indexed Gaussian processes, a continuous generalisation of the two-state model. This metric has the advantage of turning \(\mathcal {P}\) into a Riemannian manifold of global nonpositive curvature (e.g. Lang (1999)), guaranteeing unique geodesics and unique Fréchet means (e.g. Sturm (2003)). The squared distance induced on \(\mathcal {P}\) is given by

$$\begin{aligned} d_\mathcal {P}^2(P,Q) = \mathrm {Tr} \bigg [\log \Big (\sqrt{P}^{-1}Q\sqrt{P}^{-1}\Big )^2\bigg ] = \sum _{i = 1}^N \log (\mu _i)^2, \end{aligned}$$

where \(\sqrt{P}\) is the unique positive definite square root of \(P\) and \(\mu _i\) are the eigenvalues of \(P^{-1}Q\).

Definition 2

The metric \(d_{{\mathcal {W}_{}}}\) of the wald space is the pullback of \(d_\mathcal {P}\) under \(\phi \), which is given for \(W_1,W_2\in {\mathcal {W}_{}}\) by

$$\begin{aligned} d_{{\mathcal {W}_{}}}(W_1, W_2) = \inf _{\begin{array}{c} \gamma :[0,1]\rightarrow {\mathcal {W}_{}}\\ \phi \circ \gamma ~\mathrm{cont.}\;\mathrm{path},\\ \gamma (0)=W_1, \gamma (1)=W_2 \end{array}} L_{d_\mathcal {P}}(\phi \circ \gamma ), \end{aligned}$$

where \(L_{d_\mathcal {P}}(\gamma )\) is the length of the path \(\gamma \) measured in \(d_\mathcal {P}\). If no such path exists, we set \(d_{{\mathcal {W}_{}}}(W_1,W_2) = \infty \).

As previously noted, trees with edge lengths \(\ell \) tending to infinity move infinitively far apart in the BHV geometry. In the wald geometry the distance between these trees goes to zero. This is reflected in the following reparametrization \(W = [F,\lambda ]\) with \(\lambda := 1 - \exp (-\ell )\), recasting (1) as

$$\begin{aligned} \big (\phi (W)\big )_{uv} = \big (\phi ([F,\lambda ])\big )_{uv} := {\left\{ \begin{array}{ll} \prod _{e\in E(u,v)} \big (1 - \lambda (e)\big ),&{}\text {if } u \text { and } v \text { are connected},\\ 0,&{}\text {else}. \end{array}\right. } \end{aligned}$$

In particular, if \(W=[F,\lambda ]\), \(F=(V,E)\), has |E| edges, vectorizing \(\lambda \in (0,1)^{|E|}\), we have the following identification for the grove of W:

$$ {\mathcal {W}_{W}}\cong (0,1)^{|E|}\,.$$

Theorem 1

  1. 1.

    For every wald \(W = [F,\lambda ]\), \(F=(V,E)\) with grove \({\mathcal {W}_{W}}\), the mapping \((0,1)^{|E|}\cong {\mathcal {W}_{W}} {\mathop {\rightarrow }\limits ^{\phi }}\mathcal {P}\) is an embedding.

  2. 2.

    If \(W = [F,\lambda ]\) with a fully resolved (i.e. binary) tree F then \({\mathcal {W}_{W}}\) is an open subset of \({\mathcal {W}_{}}\).

Proof

cf. Lueg et al. (2021).

In consequence, \({\mathcal {W}_{}}\) is a stratified space with strata given by groves. As BHV-space can be viewed as a subset of wald space, cf. Garba et al. (2020), BHV-orthants are subsets of groves. In contrast to BHV-space, groves are not only connected to the star stratum (trees without interior edges), they are also connected to forest strata including the completely disconnected forest (consisting of N isolated vertices, no edges), which lies on the boundary of the star stratum.

3 Geodesics in Wald Space

We propose different algorithms to compute geodesics between two fully resolved trees \(W_1\) and \(W_2\), where Algorithm 4 is only applicable if \(W_1\) and \(W_2\) lie in a common grove \({\mathcal {W}_{W}}\). Dropping the embedding map \(\phi \), we consider wald space \({\mathcal {W}_{}}\) as a subset of the ambient space \(\mathcal {P}\). To this end, for \(P,Q\in \mathcal {P}\), denote the unique geodesic between P and Q by \(\gamma _{P,Q}:[0,1]\rightarrow \mathcal {P}\), the Riemann exponential and logarithm by \(\mathrm {Exp}_P^{(\mathcal {P})}:T_P\mathcal {P}\rightarrow \mathcal {P}\) and \(\mathrm {Log}_P^{(\mathcal {P})}:\mathcal {P}\rightarrow T_P\mathcal {P}\), respectively, the orthogonal tangent space projection by \(\pi _W:T_P\mathcal {P}\rightarrow T_W{\mathcal {W}_{}}\) and define the projection \(\pi :\mathcal {P}\rightarrow {\mathcal {W}_{}}, P\mapsto \pi (P):= \mathop {\mathrm {argmin}}\nolimits _{W\in {\mathcal {W}_{}}} d_{\mathcal {P}}(P,W)\), where \(\pi \) is only well-defined for \(P\in \mathcal {P}\) close enough to \({\mathcal {W}_{}}\). The following is a very simple but naive algorithm.

figure a

The next algorithm makes small (approximately geodesic) steps and successively takes the geodesic from the newest point to the destination (note the \(X_{i-1}\) and \(Y_{i-1}\) in the subscript in the update step).

figure b

The following two algorithms are inspired by Schmidt et al. (2006). They update a given path iteratively and perform a straightening of the path, eventually leading to a geodesic (cf. Figs. 14).

figure c

Exploiting the manifold structure of groves, for two walds \(W_1,W_2\in {\mathcal {W}_{[F]}}\) with the same fully resolved tree F, we change Algorithm 3 slightly and thus avoid using the projection.

figure d

We measure the quality of a proposal \((X_1,\dots ,X_n)\), \(3\le n\in \mathbb {N}\) by its length,

$$\begin{aligned} L(X_1,\dots ,X_n) = \sum _{i=1}^{n-1} d_{\mathcal {P}}(X_i, X_{i+1}) \end{aligned}$$

and its energy,

$$\begin{aligned} E(X_1,\dots ,X_n) = \frac{1}{2}\sum _{i = 1}^{n-1} d_{\mathcal {P}}(X_i, X_{i+1})^2. \end{aligned}$$
Fig. 1.
figure 1

Tree with edge weights \(\lambda ^{(1)} = (0.5, \dots , 0.5, 0.1, 0.8)\) and \(\lambda ^{(2)} = (0.5, \dots , 0.5, 0.9, 0.1)\) for computation of geodesics in Fig. 2.

Fig. 2.
figure 2

Length (left) and energy (center) of paths between the two trees from Fig. 1 obtained from the four algorithms for \(n=4,7,\dots ,46,49\). Right: coordinates \(\lambda _6,\lambda _7\) of the paths obtained from the four algorithms for \(n = 10\). Note that (NP), (IPS), (EPS) almost coincide.

Fig. 3.
figure 3

Left: the coordinate representation (only interior edges) of different neighbouring groves and two walds \(W_1,W_2\in {\mathcal {W}_{}}\). Second to left to right: Selected iterations of the (EPS) algorithm for different starting paths: the output of the (SP) algorithm, the cone path and a round path, respectively. All paths have \(n = 25\) points.

Fig. 4.
figure 4

Left: length of the paths for the iterations of the (EPS) algorithm for different starting paths. Right: energy of the paths for the iterations of the (EPS) algorithm for different starting paths.

4 Comparing Fréchet Means

For illustration, we take \(n=3\) trees \(W_1,\dots ,W_n\in {\mathcal {W}_{}}\) from Nye et al. (2016) depicted in Fig. 5, each of which having \(N=5\) leaves (3 taxa and the root were removed from the original trees for computational tractability). We compute their Fréchet means

$$\begin{aligned} W^*\in \mathop {\mathrm {argmin}}\limits _{W\in {\mathcal {W}_{}}}\sum _{k = 1}^n d_{{\mathcal {W}_{}}}\big (W_k, W\big )^2 \end{aligned}$$

in BHV-space and in wald space, cf. Fig. 6. For computation we use the algorithm of Sturm (2003). In general, the computation of other types of means is also possible (e.g. the Riemannian 1-center, cf. Arnaudon et al. (2013)).

While in BHV-space, the Fréchet mean is unique, in wald space its uniqueness is dubious. For both spaces we have performed 15 iterations after which the final subsequent iterates were less than 0.05 apart, respectively. Remarkably, the mean tree in BHV-space is a star tree. In wald space, however, it is a fully resolved tree.

Fig. 5.
figure 5

Three trees \(W_1,W_2,W_3\in {\mathcal {W}_{}}\) from Nye et al. (2016). Their Fréchet means are depicted in Fig. 6.

Fig. 6.
figure 6

Fréchet means from the three trees from Fig. 5. Left: in BHV-space, right: in wald space.