1 Introduction

Phylogenetic metrics are often used to analyse populations of phylogenetic trees, generated by some Bayesian method or by different methods of tree reconstruction from data. Such metrics are also useful to define some empirical statistics of such populations, see, for example, Nye (2011) and Benner et al. (2014). There are already a lot of phylogenetic distances available.

The simplest one, though not the oldest one, seems to be the Robinson–Foulds distance introduced in Bourque (1978), see also Robinson and Foulds (1979) and Robinson and Foulds (1981). That one is easy and efficiently to compute in linear time (Day 1985) or even in sublinear approximation (Pattengale et al. 2007). But, it has no much power in discriminating trees, since a lot of trees with similar biological meaning have distance equal to the diameter of the unweighted tree space. Much nearer to biology seems to be a variant of the Robinson–Foulds distance, the weighted matching distance. It captures similarity of splits which entails a lot of biology and is still computable in subcubic time (Bogdanowicz and Giaro 2012; Lin et al. 2012). Another good alternative to the Robinson–Foulds metric is the quartet distance (Estabrook et al. 1985). Using the induced quartet trees instead of the induced splits, it is much more biologically plausible than the Robinson–Foulds distance and also efficiently computable (Brodal et al. 2001).

A quite natural, biology adapted way of capturing tree similarity is provided by the tree rearrangement metrics. There are different basic transformations giving rise to the NNI distance (Robinson 1971), SPR distance and TBR distance. Unfortunately, computation of those distances is NP-hard and only feasible for small trees (DasGupta et al. 1997; Allen and Steel 2001; Bonet and St. John 2010). Some fixed parameter approach to compute the (rooted) SPR distance, e.g, was done in Whidden et al. (2016). Even more at the heart of evolution is the maximum parsimony distance (Fischer and Kelk 2016; Moulton and Wu 2015). Still it is NP-hard to compute that distance, even between binary unweighted phylogenetic trees (Fischer and Kelk 2016; Kelk and Fischer 2017; Bernstein 2017).

For weighted rooted phylogenetic trees, there is the euclidean type geodesic distance on tree space introduced by Billera et al. (2001). The crucial observation was that in a natural way tree space is a space of nonpositive curvature or \(\mathrm {CAT}(0)\) space, a notion introduced by Gromov. This property implies uniqueness of geodesics. Owen and Provan (2011) provided a polynomial time algorithm for computing this metric. Although this metric was defined on rooted trees, also a version on unrooted trees is used, see, for example, Nye (2011). The \(\mathrm {CAT}(0)\) idea was used again in Gavryushkin and Drummond (2016) to develop metrics for ultrametric trees. Again, efficient computation of the geodesics is possible for at least one of the metrics. Further, different natural parametrisations may yield different geodesics.

A similar approach for weighted rooted trees, see Sokal and Rohlf (1962), uses all distances of the most recent common ancestors of pairs of taxa to the root. Recently, Kendall and Colijn (2016) returned to this idea. The authors also experimented with weighting different most recent common ancestors depending on their position in the trees.

Another way to compare phylogenetic trees is to compare the metrics they induce on the taxon set. This distance-based approach is feasible since by the work of Buneman (1971, 1974), see also Zaretskii (1965) for the unweighted case, we can identify tree-induced metrics among all metrics by the famous four-point conditions. Also, under some natural minimality assumption, the tree can be identified up to isomorphy (see Lemma 3). This approach is particularly appealing since the distance between two taxa is easy to derive, to estimate and to interpret as evolutionary distance. Another of Gromov’s ideas, Gromov’s tree, can be used to approximate arbitrary distance data by tree-induced distance data (Dress et al. 2005).

Compared to the variety of metrics reviewed above and given the popularity of distance-based methods for tree reconstruction, it is surprising that the only distance-based metrics between phylogenetic trees are pathwise difference metrics, the \(\ell ^1\) and \(\ell ^2\) versions of which are well established (Williams and Clifford 1971; Penny and Hendy 1985). The \(\ell ^\infty \) version of those metrics became of interest only recently, especially in the context of tropical geometry (Huggins et al. 2012; Bernstein and Long 2017; Coons and Rusinko 2016; Lin et al. 2017). All three metrics compare the tree-induced metrics just as real vectors. Thus, they can be computed efficiently.

In the present paper, we want to construct other distance-based phylogenetic metrics. Instead of considering the distance data as just real vectors, we are looking for metrics using the metric space properties. For (compact) metric spaces, there is the well-known Gromov–Hausdorff distance (Gromov 1981)

$$\begin{aligned} D^{GH}((X,d),(X',d'))=\inf _{\varphi ,\varphi '}\rho ^H(\varphi (X),\varphi '(X')) \end{aligned}$$
(1)

where the infimum is taken over all isometric embeddings of \(X,X'\) into a common metric space, and \(\rho ^H\) is the Hausdorff metric on the compacts of that space. In fact, this distance was introduced in a different way already in Edwards (1975), see Tuzhilin (2016).

Unfortunately, computing the Gromov–Hausdorff distance between finite metric spaces is NP-hard (Mémoli 2007), even considering the metric spaces induced by metric trees (Agarwal et al. 2015). There is recent work on approximation algorithms (Agarwal et al. 2015) and relaxations of the metric or related optimisation problems (Mémoli 2007; Villar et al. 2016).

Applied to tree-induced metrics, this definition induces a semimetric on the space of all weighted trees. But, we cannot distinguish trees with permuted labels. To give a short argument, consider the two potential phylogenies in Fig. 1. The left one is common believe in the evolutionary history of human, bear, dog and sheep. The right one is completely unacceptable. This means that any meaningful distance between those two trees must be positive. Unfortunately, Gromov–Hausdorff distance doesn’t have this property since permutation of the leaf labels induces an isomorphism of metric spaces. Since we want to compare the whole trees, not just tree shapes, we have to modify the metric (1). That makes the definition more complicated (see Sect. 2) since we have to match the leaf labels, but the idea of joint embeddings remains. Fortunately, our metric becomes efficiently computable this way. Since there are several reasonable candidates to substitute the Hausdorff metric in (1), we derive three different metrics. In all three cases, the value of the metric is the solution of a linear or quadratic program of polynomial size. Much of the mathematics presented in Sect. 4 aims at reducing the size of those programs to get solutions as fast as possible.

Fig. 1
figure 1

Two hypothetical phylogenies of human, dog, bear and sheep. The right one is obtained by just permuting the labels of the leaves of the common sense phylogenetic tree on the left

For mathematical reasons, it is very convenient to include also semimetrics on the taxon set in the definition. This situation may occur in phylogenetics if we do not resolve the topology by all singleton splits, see, for instance, Robinson and Foulds (1981).

Summarisingly, we are looking for metrics on the space of weighted phylogenetic trees which are both computable in polynomial time and able to capture some biological similarity. We show in Sect. 2 how to apply Gromov’s idea of joint embeddings to define metrics on the space of semimetrics. Section 3 defines their counterparts on the spaces of weighted X-trees, unweighted phylogenetic trees and unweighted binary phylogenetic trees. Then, Sect. 4 demonstrates how to compute these metrics efficiently. We compare the metrics with the pathwise difference metrics and the NNI distance in Sect. 5. Some special computations in Sect. 6 show how our metrics behave under the change of one or two edge lengths. A small simulation study is done in Sect. 7. Finally, Sect. 8 discusses several extensions and open questions.

2 Distances Between Semimetrics

For a finite set X denote by M(X) the set of all semimetrics on X, i.e. all \(\rho :X\times X\rightarrow \mathbb {R}_{\ge 0}\) such that for all \(x,y,z\in X\) \(\rho (x,x)=0\), \(\rho (x,y)=\rho (y,x)\) and \(\rho (x,y)\le \rho (x,z)+\rho (z,y)\). Frequently, we describe such a semimetrics in an equivalent fashion by \(\rho :\left( {\begin{array}{c}X\\ 2\end{array}}\right) \rightarrow \mathbb {R}_{\ge 0}\) where \(\left( {\begin{array}{c}X\\ 2\end{array}}\right) =\left\{ \left\{ x,y\right\} :x,y\in X,x\ne y\right\} \). Let \(\mathcal M=\left\{ (Y,\rho ):\#Y<\infty ,\rho \in M(Y)\right\} \) denote the set of all finite semimetric spaces. Isometries \(\varphi :(X,\rho )\rightarrow (Y,\rho ')\) preserve the semimetrics, i.e. for all \(x,y\in X\) \( \rho (x,y)=\rho '(\varphi (x),\varphi (y))\), but need not be injective.

Definition 1

Let X be a finite set. Then, we define three functions \(\tilde{D}_1,\tilde{D}_2,\tilde{D}_\infty \) on \(M(X)\times M(X) \) by

$$\begin{aligned} \tilde{D}_1(\rho ,\rho ')= & {} \inf _{Y,\varphi ,\psi }\sum _{x\in X}\bar{d}(\varphi (x),\psi (x))\\ \tilde{D}_2(\rho ,\rho ')^2= & {} \inf _{Y,\varphi ,\psi }\sum _{x\in X}\bar{d}(\varphi (x),\psi (x))^2\\ \tilde{D}_\infty (\rho ,\rho ')= & {} \inf _{Y,\varphi ,\psi }\max _{x\in X}\bar{d}(\varphi (x),\psi (x)), \end{aligned}$$

where the infimum is taken over all \((Y,\bar{d})\in \mathcal M\) and all isometries \(\varphi :(X,\rho )\rightarrow (Y,\bar{d})\), \(\psi :(X,\rho ')\rightarrow (Y,\bar{d})\).

Note that \(\tilde{D}_\infty \) is nearest to the Gromov–Hausdorff distance, which we should implement via

$$\begin{aligned} D_{GH}(\rho ,\rho ')=\inf _{Y,\varphi ,\psi }\max (\max _{x\in X}\min _{y\in X}\bar{d}(\varphi (x),\psi (y)),\max _{y\in X}\min _{x\in X}\bar{d}(\varphi (x),\psi (y))). \end{aligned}$$

It is not complicated to deduce (see, for example, Mémoli 2007) that then the optimum is achieved by matching the points of \(\varphi (X)\) with points of \(\psi (Y)\) and find a matching with all distances as small as possible. In difference to this, all \(D_i\) match \(\varphi (x)\) with \(\psi (x)\) to compute the distance. This implies that just permuting labels to define \(\rho '\) yields nonnull distances, see the discussion around Fig. 1. Another consequence is that we compute upper bounds for the Gromov–Hausdorff distance, but this is not needed later.

We will see now that it is enough to have just one model space Y. This simplifies the optimisation problems in Definition 1 considerably. Frequently we need identical copies of our taxon set X. Using a slightly informal notation, we denote them \(X'=\left\{ x':x\in X\right\} \) and \(X''=\left\{ x'':x\in X\right\} \). To \(\rho ,\rho '\in M(X)\), we associate now the set \(E(\rho ,\rho ')\) of their extensions

$$\begin{aligned} E(\rho ,\rho ')=\left\{ \bar{d}\in M(X\cup X'): \bar{d}(x,y)=\rho (x,y),\bar{d}(x',y')=\rho '(x,y)\forall x,y\in X\right\} . \end{aligned}$$

So every extension reproduces the distances from \(\rho \) on X and the distances from \(\rho '\) on \(X'\). Just the distances \(\bar{d}(x,y')\) and \(\bar{d}(x',y)\), \(x,y\in X\) between the two images of X are not fully determined, but only constrained through the metric properties of \(\bar{d}\). It is important and easy to see that \(E(\rho ,\rho ')\), considered as a subset of \(\mathbb {R}^{(X\cup X')\times ( X\cup X')}\), is convex.

Let \(\left||\cdot \right||_i\) denote the usual \(\ell ^i\)-norm on \(\mathbb {R}^X\). We obtain a simple reformulation of Definition 1:

Lemma 1

For \(i=1,2,\infty \)

$$\begin{aligned} \tilde{D}_i(\rho ,\rho ')=\inf _{\bar{d}\in E(\rho ,\rho ')}\left||(\bar{d}(x,x'))_{x\in X}\right||_i. \end{aligned}$$
(2)

Thus, to compute the distances \(\tilde{D}_i(\rho ,\rho ')\), just one convex function over the convex set \(E(\rho ,\rho ')\) has to be minimised. Compactness of the sublevel sets of the norms \(\left||\cdot \right||_i\) gives directly

Lemma 2

For \(i=1,2,\infty \), there exists a \(d^*_i\in E(\rho ,\rho ')\) such that

$$\begin{aligned} \tilde{D}_i(\rho ,\rho ')=\left||(d_i^*(x,x'))_{x\in X}\right||_i. \end{aligned}$$

Now we are ready to prove that we defined metrics.

Theorem 1

\(\tilde{D}_i\), \(i=1,2,\infty \), are metrics on M(X).

Proof

Symmetry is clear.

If \(\tilde{D}_i(\rho ,\rho ')=0\), choose \(d_i^*\in E(\rho ,\rho ') \) according to the previous lemma. Obviously, we obtain \(d_i^*(x,x')=0\) for all \(x\in X\). The triangle inequality implies for all \(x,y\in X\)

$$\begin{aligned} \rho (x,y)=d^*_i(x,y)=d_i^*(x',y')=\rho '(x,y) \end{aligned}$$

such that \(\rho =\rho '\).

Now fix \(\rho ,\rho ',\rho ''\in M(X)\). Using again Lemma 2, we choose \(d_1\in M(X\cup X')\) extending \(\rho ,\rho '\) and \(d_2\in M(X'\cup X'')\) extending \(\rho ',\rho ''\) such that

$$\begin{aligned} \tilde{D}_i(\rho ,\rho ')= & {} \left||(d_1(x,x'))_{x\in X}\right||_i\\ \tilde{D}_i(\rho ',\rho '')= & {} \left||(d_2(x',x''))_{x\in X}\right||_i. \end{aligned}$$

Following Cristina (2008) or Lemma 7 in “Appendix”, we find some \(d\in M(X\cup X'\cup X'')\) extending both \(d_1,d_2\):

$$\begin{aligned} d|_{\left( {\begin{array}{c}X\cup X'\\ 2\end{array}}\right) }=d_1\quad \text {and}\quad d|_{\left( {\begin{array}{c}X'\cup X''\\ 2\end{array}}\right) }=d_2. \end{aligned}$$

We see now from monotonicity of the \(\ell ^i\)-norms on \(\mathbb {R}_{\ge 0}^X\) and their triangle inequalities that

$$\begin{aligned} \tilde{D}_i(\rho ,\rho '')\le & {} \left||(d(x,x''))_{x\in X}\right||_i\le \left||(d(x,x')+d(x',x''))_{x\in X}\right||_i\\\le & {} \left||(d(x,x'))_{x\in X}\right||_i+\left||(d(x',x''))_{x\in X}\right||_i\\= & {} \left||(d_1(x,x'))_{x\in X}\right||_i+\left||(d_2(x',x''))_{x\in X}\right||_i\\= & {} \tilde{D}_i(\rho ,\rho ')+ \tilde{D}_i(\rho ',\rho ''). \end{aligned}$$

\(\square \)

3 Distances Between X-Trees

We are mainly interested in metrics on tree space. To get a metric space from a tree (or a graph), we metrise trees by the lengths of shortest paths. To start, let us introduce some phylogenetic and graph theoretic notions. For more details, see Semple and Steel (2003).

Let \(G=(V,E,q)\) be a weighted connected graph, i.e. \(E\subseteq \left( {\begin{array}{c}V\\ 2\end{array}}\right) \) and \(q:E\rightarrow \mathbb {R}_{\ge 0}\). Then, we define the induced semimetric on V by

$$\begin{aligned} d_G^q(x,y)=\inf \left\{ \mathrm {len}(p):p \text{ path } \text{ from } \text{ x } \text{ to } \text{ y } \right\} \end{aligned}$$
(3)

where

$$\begin{aligned} \mathrm {len}(x_0x_1\dots x_m)=\sum _{i=1}^mq(\left\{ x_{i-1},x_i\right\} ) \end{aligned}$$

is the length of the path \(x_0x_1\dots x_m\) from \(x_0\) to \(x_m\). For unweighted graphs (VE), we assume \(q(\left\{ x,y\right\} )=1\) for all \(\left\{ x,y\right\} \in E\).

We only consider unrooted trees, i.e. connected acyclic graphs (VE). A weighted X-tree is a quadruple \((V,E,q,\mu )\), where (VE) is a tree, \(\mu :X\rightarrow V\) is a (not necessarily injective) map and \(q:E\rightarrow \mathbb {R}_{>0}\) is a weight function. Additionally, it is required that all vertices \(v\in V\) of degree \(\le 2\) are included in \(\mu (X)\). Identifying isomorphic variants, let the tree space T(X) be the set of all weighted X-trees. Unweighted phylogenetic X-trees are characterised by all edges having weight 1 and by \(\mu \) being an injective map onto the set of all vertices \(v\in V\) of degree 1, which implies absence of vertices of degree 2. The corresponding subspace of T(X) will be denoted \(T_1(X)\). In a binary (bifurcating) phylogenetic X-tree, all vertices have degree 1 or 3. We denote the set of binary phylogenetic X-trees by \(T_1^2(X)\).

For an X-tree, \(\tau =(V,E,q,\mu )\) denote the induced semimetric on X by \(\rho _\tau \):

$$\begin{aligned} \rho _\tau (x,y)=d_{(V,E)}^q(\mu (x),\mu (y)), \quad x,y\in X. \end{aligned}$$

This means that \(\rho _\tau (x,y)\) is the length of the shortest path between the leaves (labelled) x and y. Then, we define for two weighted X-trees \(\tau ,\tau '\in T(X)\) and \(i=1,2,\infty \)

$$\begin{aligned} D_i(\tau ,\tau ')=\tilde{D}_i(\rho _\tau ,\rho _{\tau '}). \end{aligned}$$

Again, all three \(D_i\) are metrics on tree space. This can be seen from the following characterisation of tree-induced semimetrics, provided in essence by Buneman (1971).

Lemma 3

For \(\rho \in M(X)\), there exists a weighted X-tree \(\tau \) with \(\rho =\rho _\tau \) if and only if for all \(x,y,z,w\in X\) the four-point condition

$$\begin{aligned} \rho (x,y)+\rho (z,w)\le \max (\rho (x,z)+\rho (y,w), \rho (x,w)+\rho (y,z)) \end{aligned}$$
(4)

is fulfilled.

Given such a \(\rho \in M(X)\), all X-trees \(\tau \) with \(\rho =\rho _\tau \) are isomorphic.

Proof

Necessity of (4) is proved in the same way as for metrics \(\rho \), see Lemma 7.1.7 of Semple and Steel (2003).

For sufficiency and uniqueness, let \(\tilde{X}\) be the set of equivalence classes of X under identifying points \(x,y\in X\) with \(\rho (x,y)=0\). We define \(\tilde{\rho }\) on \(\tilde{X}\) through \(\tilde{\rho }([x],[y])=\rho (x,y)\) where [x], [y] are the equivalence classes of \(x,y\in X\). The triangle inequality for \(\rho \) implies that \(\tilde{\rho }\) is a well-defined metric on \(\tilde{X}\). Further, (4) is fulfilled for \(\tilde{\rho }\), too. Thus, there exists a weighted \(\tilde{X}\)-tree \(\tilde{\tau }=(V,E,\tilde{\mu },q)\) inducing \(\tilde{\rho }\) (Buneman 1971). Defining \(\mu =\tilde{\mu }\circ [\cdot ]\), the X-tree \(\tau =(V,E,\mu ,q)\) induces \(\rho \).

Let \(\tau '=(V',E',\mu ',q')\) be another weighted X-tree inducing \(\rho \). Since \(q'\) is strictly positive, any \(x,y\in X\) with \(\rho (x,y)=0\) must fulfil \(\mu '(x)=\mu '(y)\). Thus, there is a mapping \(\tilde{\mu }':\tilde{X}\rightarrow V'\) with \(\tilde{\mu }'=\mu '\circ [\cdot ]\). This gives us the weighted \(\tilde{X}\)-tree \(\tilde{\tau }'=(V',E',\tilde{\mu }',q')\) inducing the metric \(\tilde{\rho }\). By Theorem 7.1.8 of Semple and Steel (2003), \(\tilde{\tau }\) and \(\tilde{\tau }'\) are isomorphic. Thus, \(\tau \) and \(\tau '\) are isomorphic, too. \(\square \)

For tree-induced metrics \(\rho _\tau ,\rho _{\tau '}\), we can consider extensions \(\bar{d}\in E(\rho _\tau ,\rho _{\tau '})\) as being induced by a graph metric on \(X\cup X'\). Let us look at one example.

Example 1

We want to compare for \(X=\left\{ A,B,C,D\right\} \) the two unweighted X-trees

with corresponding distances \(\rho =\rho _\tau ,\rho '=\rho _{\tau '}\).

We want to derive possible extensions \(\bar{d}\) of \(\rho ,\rho '\). For this goal, denote \(\bar{d}(x,x')=\delta _x\), \(x=A,B,C,D\). Then, \(\delta _A,\delta _B,\delta _C,\delta _D\ge 0\) should be compatible with the distances on the weighted graph

see Theorem 6. One possible choice is \(\delta _A=0,\delta _B=1,\delta _C=1,\delta _D=0\), i.e.

is consistent. Obviously, we embedded now both \(\tau \) and \(\tau '\) isometrically into the metric space of the graph \(G_1\). We see \(D_\infty (\tau ,\tau ')\le 1\), \(D_2(\tau ,\tau ')\le \sqrt{2}\) and \(D_1(\tau ,\tau ')\le 2\). In fact, equality holds for \(D_1(\tau ,\tau ')\), see Example 2. Picturely, we look for graphs similar to \(G_1\) with “shortest” bridges between the left and the right sides. The meaning of “short” is given by the \(\ell ^i\)-norm.

In biological terms, the trees \(\tau \) and \(\tau '\) or their induced metrics \(\rho _\tau \) and \(\rho _{\tau '}\), respectively, entail certain (genetic) differences between the taxa \(A,B,C,\dots \) and \(A',B',C',\dots \). Those differences we try to match in a parsimonious (by minimisation), but consistent (\(\bar{d}\) is still a metric) way. For example, A and \(A'\) could stand for different individuals of taxon A and similarly for BCD, and we try to get a parsimonious yet consistent picture of possible mutations in the genealogy of those individuals.

It is an important general feature of the minimisation problem in (2) that we need not fix the whole extension \(\bar{d}\). It is enough to study the constraints on the variables \(\delta _x\), \(x=A,B,C,D\). This is elaborated in the next section.

4 Efficient Computation

As already mentioned after Lemma 1, we can compute \(\tilde{D}_1\) and \(\tilde{D}_\infty \) by solving a linear program and \(\tilde{D}_2\) by solving a quadratic program. Thus, we can compute the distance in a time polynomially bounded in \(n=\#X\) (Karmarkar 1984). In the naïve way, the linear (quadratic) program has the \(n^2\) variables \(\epsilon _{xy}=\bar{d}(x,y')\) and \(\mathrm {O}(n^3)\) constraints resulting essentially from the triangle inequalities in triangles of the form \(x,y,z'\) or similar. But, we can do the computation more efficiently. Observe that the objective function in (2) depends on the unknown values \((\delta _x)_{x\in X}\), \(\delta _x=\epsilon _{xx}=\bar{d}(x,x')\) only. The reformulation of the constraints forced by \(\bar{d}\) being a semimetric using that variables only is provided by the following theorem. We prove it in “Appendix”.

Theorem 2

(quadrangle inequalities) Let \(\rho ,\rho '\in M(X)\) and \((\delta _x)_{x\in X}\in \mathbb {R}_{\ge 0}^X\) be given. Then, there exists a \(\bar{d}\in E(\rho ,\rho ')\) with

$$\begin{aligned} \bar{d}(x,x')=\delta _x,\quad x \in X, \end{aligned}$$

if and only if for all \(x\ne y\in X\) the following inequalities are fulfilled:

$$\begin{aligned} \delta _x+\delta _y\ge & {} \left|\rho (x,y)-\rho '(x,y)\right|\nonumber \\ \left|\delta _x-\delta _y\right|\le & {} \rho (x,y)+\rho '(x,y). \end{aligned}$$
(5)

Consequently, \(\tilde{D}_i(\rho ,\rho ')\) solves the program

$$\begin{aligned} \left||\delta \right||_i\rightarrow & {} \min \quad \text{ under }\nonumber \\ \delta _x+\delta _y\ge & {} \left|\rho (x,y)-\rho '(x,y)\right|\quad x, y\in X\nonumber \\ \left|\delta _x-\delta _y\right|\le & {} \rho (x,y)+\rho '(x,y)\quad x\ne y\in X. \end{aligned}$$
(6)

Observe that \(x=y\) in the second line yields \(\delta _x\ge 0\). Thus, \(\tilde{D}_i(\rho ,\rho ')\) can be obtained as solution of a linear (quadratic) program in the n variables \(\delta _{x}=\bar{d}(x,x')\) with \(\mathrm {O}(n^2)\) constraints.

Example 2

Let us continue Example 1 and compute \(D_i(\tau ,\tau ')\) exactly. Since \(\rho (A,D)=\rho '(A,D)\) and \(\rho (B,C)=\rho '(B,C)\), the nontrivial constraints from the upper part of (5) read as

$$\begin{aligned} \delta _A+\delta _B\ge & {} 1\nonumber \\ \delta _A+\delta _C\ge & {} 1\nonumber \\ \delta _B+\delta _D\ge & {} 1\nonumber \\ \delta _C+\delta _D\ge & {} 1. \end{aligned}$$
(7)

Consequently,

$$\begin{aligned} D_1(\tau ,\tau ')\ge \delta _A+\delta _B+ \delta _C+\delta _D\ge 2. \end{aligned}$$

The graph \(G_1\) from Example 1 realises this lower bound.

For \(i=\infty \), (7) immediately shows

$$\begin{aligned} 2 D_\infty (\tau ,\tau ')\ge 1 \end{aligned}$$

or \(D_\infty (\tau ,\tau ')\ge \frac{1}{2}\). This lower bound is realised by \(\delta _A=\delta _B=\delta _C=\delta _D=\frac{1}{2}\):

For \(i=2\), (7) gives \(\delta _A^2+\delta _C^2\ge \frac{(\delta _A+\delta _C)^2}{2}=\frac{1}{2}\). The same calculation for BD yields

$$\begin{aligned} \delta _A^2+\delta _B^2+\delta _C^2+\delta _D^2\ge 1 \end{aligned}$$

or \(D_2(\tau ,\tau ')\ge 1\). \(G_2\) realises this lower bound, too.

The upper bounds on the absolute differences were not used in the example. Interestingly, they are not important in general:

Theorem 3

For \(\rho ,\rho '\in M(X)\) \(\tilde{D}_i(\rho ,\rho ')\) is the solution of the convex program

$$\begin{aligned} \left||\delta \right||_i\rightarrow & {} \min \quad {\text {under}}\nonumber \\ \delta _x+\delta _y\ge & {} \left|\rho (x,y)-\rho '(x,y)\right|x,y\in X. \end{aligned}$$
(8)

In Isbell (1964), see also Dress (1984), the set of these constraints for \(\rho '=0\) was studied thoroughly. Our proof is obtained by adapting some arguments from Isbell (1964).

Proof

By sublevel compactness for \(\left||\cdot \right||_i\) there exists a minimal point \(\delta \in \mathbb {R}_{\ge 0}^X\) of (8). We show by contradiction that for this \(\delta \) the second part of (5) is fulfilled as well. This implies coincidence of the solutions of (8) and (6).

So let us fix \(x,y\in X\) with

$$\begin{aligned} \delta _x>\delta _y+ \rho (x,y)+\rho '(x,y). \end{aligned}$$

We define \(\delta ^*\in \mathbb {R}_{\ge 0}^X\) by

$$\begin{aligned} \delta ^*_z=\left\{ \begin{array}{ll} \delta _z&{}\quad z\ne x\\ \delta _y+ \rho (x,y)+\rho '(x,y)&{}\quad z=x. \end{array} \right. \end{aligned}$$

Clearly, \(0\le \delta ^*_z\le \delta _z\) for all \(z\in X\) with strict second inequality for \(z=x\). Thus, \(\left||\delta ^*\right||_i<\left||\delta \right||_i\).

First we see

$$\begin{aligned} \delta _x^*+\delta ^*_y=\delta _y+ \rho (x,y)+\rho '(x,y)+\delta _y\ge \left|\rho (x,y)-\rho '(x,y)\right|. \end{aligned}$$

Fix now an arbitrary \(u\in X\), \(u\ne x,y\). The triangle inequality shows

$$\begin{aligned} \delta _x^*+\delta ^*_u= & {} \delta _y+ \rho (x,y)+\rho '(x,y)+\delta _u\\\ge & {} \left|\rho (y,u)-\rho '(y,u)\right|+\left|\rho (x,u)-\rho (y,u)\right|+\left|\rho '(x,u)-\rho '(y,u)\right|\\\ge & {} \left|\rho (y,u)-\rho '(y,u)+\rho (x,u)-\rho (y,u)+\rho '(x,u)-\rho '(y,u)\right|\\= & {} \left|\rho (x,u) -\rho '(x,u)\right|. \end{aligned}$$

Thus, \(\delta ^*\) fulfils all constraints from (8). \(\left||\delta ^*\right||_i<\left||\delta \right||_i\) contradicts that \(\delta \) is optimal for (8). \(\square \)

5 Comparison to Other Metrics

First we compare our metrics to the pathwise difference metrics, defined by

$$\begin{aligned} \tilde{D}^{pd}_i(\rho ,\rho ')=\left||(\rho (x,y)-\rho '(x,y))_{\left\{ x,y\right\} \in \left( {\begin{array}{c}X\\ 2\end{array}}\right) }\right||_i \end{aligned}$$
(9)

on M(X). On T(X), we set \(D^{pd}_i(\tau ,\tau ')= \tilde{D}^{pd}_i(\rho _\tau ,\rho _{\tau '})\). \(D^{pd}_1\) and \(D^{pd}_2\) were defined in Williams and Clifford (1971) and Steel and Penny (1993), respectively. \(\tilde{D}^{pd}_\infty \) is just the distortion of the identity map in the theory of metric spaces (Burago et al. 2001; Lang et al. 2013). To your knowledge, it was used in Huggins et al. (2012) under the name k-interval cospeciation the first time.

Abbreviating \(n=\#X\) we have standard estimates between our metrics for different i, resulting from similar estimates for the norms \(\left||\cdot \right||_i\), first.

Lemma 4

For \(\rho ,\rho '\in M(X)\), it holds

$$\begin{aligned} \tilde{D}_1(\rho ,\rho ')\ge & {} \tilde{D}_2(\rho ,\rho ')\ge \tilde{D}_\infty (\rho ,\rho ')\ge \frac{1}{\sqrt{n}} \tilde{D}_2(\rho ,\rho ')\ge \frac{1}{n}\tilde{D}_1(\rho ,\rho ')\\ \tilde{D}^{pd}_1(\rho ,\rho ')\ge & {} \tilde{D}^{pd}_2(\rho ,\rho ')\ge \tilde{D}^{pd}_\infty (\rho ,\rho ')\ge \frac{1}{\sqrt{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } } \tilde{D}^{pd}_2(\rho ,\rho ')\ge \frac{1}{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } \tilde{D}^{pd}_1(\rho ,\rho '). \end{aligned}$$

Theorem 4

For \(\rho ,\rho '\in M(X)\), it holds

$$\begin{aligned} \frac{n}{2} \tilde{D}^{pd}_1(\rho ,\rho ') \ge&\tilde{D}_1(\rho ,\rho ')&\ge \frac{1}{n-1} \tilde{D}^{pd}_1(\rho ,\rho ')\end{aligned}$$
(10)
$$\begin{aligned} \frac{\sqrt{n}}{2} \tilde{D}^{pd}_2(\rho ,\rho ') \ge&\tilde{D}_2(\rho ,\rho ')&\ge \sqrt{\frac{1}{2(n-1)}} \tilde{D}^{pd}_2(\rho ,\rho ')\end{aligned}$$
(11)
$$\begin{aligned} \tilde{D}_\infty (\rho ,\rho ')= & {} \frac{1}{2} \tilde{D}^{pd}_\infty (\rho ,\rho '). \end{aligned}$$
(12)

(12) reminds of the fact that the Gromov–Hausdorff distance of two compact metric spaces is one half of the infimum over the distortions of correspondences between the two spaces (Burago et al. 2001, Theorem 7.3.25). Further, (12) shows that we need not solve a linear program for computing \(\tilde{D}_\infty \).

Proof

We choose a minimal point \(\delta \) of (8), i.e. \(\left||\delta \right||_i=\tilde{D}_i(\varrho ,\varrho ')\). Like in Example 2 we get for all \(x\ne y\in X\)

$$\begin{aligned} \delta _x+\delta _y\ge & {} \left|\rho (x,y)-\rho '(x,y)\right|\end{aligned}$$
(13)
$$\begin{aligned} \delta ^2_x+\delta ^2_y\ge \frac{1}{2}( \delta _x+\delta _y)^2\ge & {} \frac{1}{2}\left|\rho (x,y)-\rho '(x,y)\right|^2\end{aligned}$$
(14)
$$\begin{aligned} \max \left\{ \delta _x:x\in X\right\} \ge \frac{1}{2}( \delta _x+\delta _y)\ge & {} \frac{1}{2}\left|\rho (x,y)-\rho '(x,y)\right|. \end{aligned}$$
(15)

Let \(i=\infty \). Taking the maximum of (15) over all \(\left\{ x,y\right\} \in \left( {\begin{array}{c}X\\ 2\end{array}}\right) \) yields

$$\begin{aligned} \tilde{D}_\infty (\rho ,\rho ')= & {} \left||\delta \right||_\infty =\max \left\{ \delta _x:x\in X\right\} \\\ge & {} \frac{1}{2}\max \left\{ \left|\rho (x,y)-\rho '(x,y)\right|:x,y\in X\right\} =\frac{1}{2} \tilde{D}^{pd}_\infty (\rho ,\rho '). \end{aligned}$$

Now define \(\bar{\delta }\in \mathbb {R}_{\ge 0}^X\) by setting

$$\begin{aligned} \bar{\delta }_z=\frac{1}{2}\max \left\{ \left|\rho (x,y)-\rho '(x,y)\right|:x,y\in X\right\} ,\quad z\in X. \end{aligned}$$

The constraints from (8) are clearly fulfilled for \(\bar{\delta }\). Evaluating \(\left||\bar{\delta }\right||_\infty \) gives

$$\begin{aligned} \tilde{D}_\infty (\rho ,\rho ')\le & {} \left||\bar{\delta }\right||_\infty \\= & {} \frac{1}{2}\max \left\{ \left|\rho (x,y)-\rho '(x,y)\right|:x,y \in X\right\} =\frac{1}{2} \tilde{D}^{pd}_\infty (\rho ,\rho ') \end{aligned}$$

and (12) is proved.

For \(i=1\), we sum (13) over all \(\left\{ x,y\right\} \in \left( {\begin{array}{c}X\\ 2\end{array}}\right) \). This yields

$$\begin{aligned} (n-1)\tilde{D}_1(\varrho ,\varrho ')= & {} (n-1)\sum _{x\in X}\delta _x\\\ge & {} \sum _{\left\{ x,y\right\} \in \left( {\begin{array}{c}X\\ 2\end{array}}\right) } \left|\rho (x,y)-\rho '(x,y)\right|=\tilde{D}^{pd}_1(\varrho ,\varrho '), \end{aligned}$$

the lower bound in (10). From Lemma 4, we derive the upper bound:

$$\begin{aligned} \tilde{D}_1(\rho ,\rho ')\le n \tilde{D}_\infty (\rho ,\rho ')=\frac{n}{2} \tilde{D}^{pd}_\infty (\rho ,\rho ')\le \frac{n}{2} \tilde{D}^{pd}_1(\rho ,\rho '). \end{aligned}$$

For \(i=2\), the lower bound in (11) is proved similarly to \(i=1\) using (14). The upper bound follows again from the previous lemma:

$$\begin{aligned} \tilde{D}_2(\rho ,\rho ')\le \sqrt{n} \tilde{D}_\infty (\rho ,\rho ')=\frac{\sqrt{n}}{2} \tilde{D}^{pd}_\infty (\rho ,\rho ')\le \frac{\sqrt{n}}{2} \tilde{D}^{pd}_2(\rho ,\rho '). \end{aligned}$$

\(\square \)

We want to demonstrate now that the new metrics are biologically meaningful. Especially we show that an NNI (nearest neighbour interchange) move is a relatively small step in tree space \(T^2_1(X)\) when measured by these metrics. An NNI move (Allen and Steel 2001) is given by

or

where denote different subtrees. The minimal number of NNI moves to reach \(\tau '\in T_1^2(X)\) from \(\tau \in T_1^2(X)\) is their NNI distance \(D^{NNI}(\tau ,\tau ')\) (Robinson 1971).

Theorem 5

Consider \(\tau ,\tau '\in T^2_1(X)\) with \(D^{NNI}(\tau ,\tau ')=1\). Then,

$$\begin{aligned} D_1(\tau ,\tau ')\le & {} \frac{n}{2}\\ D_2(\tau ,\tau ')\le & {} \frac{\sqrt{n}}{2}\\ D_\infty (\tau ,\tau ')= & {} \frac{1}{2} \end{aligned}$$

Consequently, for all \(\tau ,\tau '\in T^2_1(X)\)

$$\begin{aligned} D^{NNI}(\tau ,\tau ')\ge 2 D_\infty (\tau ,\tau ')\ge \frac{2}{\sqrt{n}}D_2(\tau ,\tau ')\ge \frac{2}{n} D_1(\tau ,\tau '). \end{aligned}$$

Note that these formulae give estimates of the gradient of the metrics \(D_i\) in the sense of Lin et al. (2012).

Proof

Specifically we consider

Let \(A'\) be the set of labels mapped into the subtrees and let \(B'\) be the set of labels mapped into the subtrees . Then,

$$\begin{aligned} \left|\rho _{\tau }(x,y)-\rho _{\tau '}(x,y)\right|=\left\{ \begin{array}{ll} 1&{}\quad x\in A',y\in B'\\ 1&{}\quad x\in B',y\in A'\\ 0&{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$

Theorem 4 yields directly \(D_\infty (\tau ,\tau ')=\frac{1}{2}\).

For \(\delta \in \mathbb {R}_{\ge 0}^X\) fulfilling the constraints in (8), define \(\delta ^*\in \mathbb {R}_{\ge 0}^X\) by

$$\begin{aligned} \delta ^*_x=\left\{ \begin{array}{ll} \tilde{\delta }_A= \frac{1}{\#A'}\sum _{y\in A'}\delta _y&{}\quad x\in A'\\ \tilde{\delta }_B= \frac{1}{\#B'}\sum _{y\in B'}\delta _y&{}\quad x\in B' \end{array} \right. \end{aligned}$$

Neither permutations of labels in \(A'\) nor in \(B'\) change the absolute difference of the metrics. Thus, \(\delta ^*\) fulfils the constraints, too. By convexity, \(\left||\delta ^*\right||_i\le \left||\delta \right||_i\) for \(i=1,2\).

For \(i=1\) optimisation among all vectors of the form, \(\delta ^*\) means to solve the linear program

$$\begin{aligned} \#A'\tilde{\delta }_{A}+\#B'\tilde{\delta }_{B}\rightarrow & {} \min \quad \text{ under }\\ \tilde{\delta }_{A}+\tilde{\delta }_{B}\ge & {} 1. \end{aligned}$$

Its solution is \(1-\tilde{\delta }_{B}=\tilde{\delta }_{A}=\left\{ \begin{array}{l}1 \#A' <\#B'\\ 0 \#A' \ge \#B' \end{array} \right. \) with objective value

$$\begin{aligned} D_1(\tau ,\tau ')=\min (\#A',\#B')\le \frac{n}{2}. \end{aligned}$$

For \(i=2\), we have to solve

$$\begin{aligned} \#A'\tilde{\delta }_{A}^2+\#B'\tilde{\delta }_{B}^2\rightarrow & {} \min \quad \text{ under }\\ \tilde{\delta }_{A}+\tilde{\delta }_{B}\ge & {} 1. \end{aligned}$$

Now the minimum is realised by \(\tilde{\delta }_{A}=\frac{\#B'}{n}\) and \(\tilde{\delta }_{B}=\frac{\#A'}{n}\) with value

$$\begin{aligned} D_2(\tau ,\tau ')^2=\frac{\#A'\#B'}{n}\le \frac{n}{4}. \end{aligned}$$

The definition of the NNI distance and Lemma 4 imply the second hypothesis. \(\square \)

Note that different NNI moves have different \(D_i\)-length for \(i=1,2\) in general.

Now we show that the \(D_i\)-distance of \(\tau ,\tau '\in T^2_1(X)\) which are one NNI move apart is small compared to the diameter of the space \(T^2_1(X)\). For the latter, we have upper bounds which hold on \(T_1(X)\) as well.

Lemma 5

For all \(\tau ,\tau '\in T_1(X)\), it holds

$$\begin{aligned} D_1(\tau ,\tau ')\le & {} n\cdot \frac{n-2}{2}\\ D_2(\tau ,\tau ')\le & {} \sqrt{n}\cdot \frac{n-2}{2}\\ D_\infty (\tau ,\tau ')\le & {} \frac{n-2}{2} \end{aligned}$$

Proof

All shortest paths in \(\tau \) and \(\tau '\) have at least one and at most \(n-1\) edges. Thus, \(D^{pd}_\infty (\tau ,\tau ')\le {n-2}\). By Theorem 4, \(D_\infty (\tau ,\tau ')\le \frac{n-2}{2}\). Lemma 4 implies the other two bounds. \(\square \)

We want to show now that the estimates in Lemma 5 have the correct order in the number of taxa n. The examples are caterpillars, i.e. binary trees for which all interior vertices form a chain.

Lemma 6

Let us be given \(n=4m+1\) for some \(m\in \mathbb {N}\), \(m\ge 1\), \(X=\left\{ 1,2,\dots ,4m,4m+1\right\} \). Suppose \(\tau \) is an unrooted (binary) caterpillar tree with cherries \(\left\{ 1,2\right\} \) and \(\left\{ 4m,4m+1\right\} \):

and \(\tau '\) is obtained from \(\tau \) by reversing the order of the even labels, i.e. 2j is interchanged with \(2(2m+1-j)\) for \(j=1,\dots ,2m\):

Then,

$$\begin{aligned} D_1(\tau ,\tau ')\ge & {} 4m^2\\ D_2(\tau ',\tau ')\ge & {} \sqrt{\frac{16}{3}m^3-\frac{4}{3}m}\\ D_\infty (\tau ,\tau ')= & {} 2m-1. \end{aligned}$$

Proof

It is easy to see that for \(1\le x<y\le n=4m+1\)

$$\begin{aligned} \rho _\tau (x,y)=\left\{ \begin{array}{cl} 2&{}\quad x=1,y=2\\ y&{}\quad x=1,2,\quad 3\le y\le 4m-1\\ 4m&{}\quad x=1,2,\quad y=4m,4m+1\\ 4m+2-x&{}\quad 3\le x\le 4m-1,\quad y=4m,4m+1\\ 2&{}\quad x=4m,y=4m+1\\ y-x+2&{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$

By construction,

$$\begin{aligned} \rho _{\tau '}(x,y)=\left\{ \begin{array}{cl} \rho _\tau (x,y) &{}\quad x\equiv y\pmod 2\\ \rho _\tau (x,4m+2-y)&{}\quad \text{ otherwise } \end{array} \right. \end{aligned}$$

First, the formula for \(D_\infty (\tau ,\tau ')\) follows immediately from Theorem 4.

Continuing, (8) contains the constraints \(\delta _{2j-1}+\delta _{2j}\ge \left|4(m-j)+2\right|\) and \(\delta _{4m+2-2j}+\delta _{4m+3-2j}\ge \left|4(m-j)+2\right|\), \(1\le j\le m\). Summing up these constraints gives the lower bound for \(D_1(\tau ,\tau ')\).

Applying the inequality \(a^2+b^2\ge \frac{(a+b)^2}{2}\) to the same constraints and again summing up these inequalities yield the lower bound for \( D_2(\tau ,\tau ')\). \(\square \)

6 Local Properties

From Theorem 3, we obtain for all \(\rho ,\rho ',\rho ''\in M(X)\)

$$\begin{aligned} \tilde{D}_i(\rho +\rho ',\rho +\rho '')= \tilde{D}_i(\rho ',\rho ''). \end{aligned}$$

In general, the sum of two tree-induced semimetrics is not a tree-induced semimetric. But, if \(\rho ',\rho ''\) result from simple manipulations of the edge lengths in the tree \(\tau \) corresponding to \(\rho =\rho _\tau \), some computations are possible. They inform us about the local behaviour of the metrics \(D_1,D_2\).

First we compute the influence of changing one edge length. Usually edges of an X-tree are described by the split they induce on X. Deleting an edge decomposes the tree into two connected components. Then, the induced split is the corresponding bipartition of X. Bipartitions are denoted A|B with \(A,B\subset X\), \(A\cup B= X\), \(A\cap B=\emptyset \).

Example 3

Consider for \(l>0\) the unresolved weighted X-trees

Thus, A|B is a split of X and l is the length of the edge inducing this split. We are interested in \(D_i(\tau _{A,B}^l,\tau _{A,B}^{l'})\). By the preceding considerations, these are the distances between two weighted X-trees displaying the same splits with the same length except the split A|B, where the lengths are l and \(l'\).

We see that the constraints from (8) turn into

$$\begin{aligned} \delta _x+\delta _y\ge |l-l'|\quad x \in A, y\in B . \end{aligned}$$

Using the same arguments as in the proof of Theorem 5, we may assume that

$$\begin{aligned} \delta _x=\left\{ \begin{array}{ll} a&{}\quad x\in A\\ b&{}\quad x\in B \end{array} \right. \end{aligned}$$

for some \(a,b\in \mathbb {R}_{\ge 0}\) with \(a+b\ge \left|l-l'\right|\).

For \(i=1\), we find

$$\begin{aligned} \left||\delta \right||_1=\#A a+\# Bb\ge \#A a+\# B \left( \left|l-l'\right|-a\right) . \end{aligned}$$

The minimal value of the latter function of a in \(\left[ 0,\left|l-l'\right|\right] \) is

$$\begin{aligned} D_1(\tau _{A,B}^l,\tau _{A,B}^{l'})=\min (\#A,\# B) \left|l-l'\right|. \end{aligned}$$

It is attained at \(\left|l-l'\right|-b=a=\left\{ {\begin{array}{l} 0 \#A\ge \#B \\ \left|l-l'\right|\#A\le \#B \end{array}}\right. \).

Similarly, we have to minimise for \(i=2\)

$$\begin{aligned} \left||\delta \right||_2^2=\#A a^2+\# Bb^2\ge \#A a^2+\# B \left( \left|l-l'\right|-a\right) ^2. \end{aligned}$$

Now the minimum is attained at \(a=\frac{\#B}{\#A+\#B}\left|l-l'\right|\), \(b=\frac{\#B}{\#A+\#B}\left|l-l'\right|\) as

$$\begin{aligned} D_2(\tau _{A,B}^l,\tau _{A,B}^{l'})=\sqrt{\frac{\#A\# B}{n}} \left|l-l'\right|. \end{aligned}$$

Summarisingly, we observe that different splits of a tree may contribute with different strengths to the distance. This differs from the behaviour of the geodesic distance.

Note that the above computations imply estimates for the Robinson–Foulds metric similar to Theorem 5.

Now we change two edge lengths simultaneously. Let \(\tau _0\in T(X)\) denote the tree with one vertex and without edges. Thus, the label function \(\mu \) maps to a single point and \(\rho _{\tau _0}=0\).

Example 4

Let \(l,l'>0\) and pairwise disjoint \(A,B,C\subset X\), \(A\cup B\cup C=X\), be given. We want to compute \(D_i(\tau _0,\tau _{A,B,C}^{l,l'})\) where

This tree captures the “difference” of two trees with the same shape which differ in the lengths of two edges.

Again, symmetry allows us to consider only \(\delta \in \mathbb {R}_{\ge 0}^X\) with

$$\begin{aligned} \delta _x=\left\{ {\begin{array}{ll} a&{}\quad x\in A\\ b&{}\quad x\in B\\ c&{}\quad x\in C \end{array}} \right. \end{aligned}$$

for some \(a,b,c\in \mathbb {R}_{\ge 0}\) which fulfil now

$$\begin{aligned} a+b\ge & {} l\nonumber \\ b+c\ge & {} l'\nonumber \\ a+c\ge & {} l+l'. \end{aligned}$$
(16)

This yields a linear program or a quadratic program in \(\mathbb {R}_{\ge 0}^3\).

For computing \( D_1(\tau _0,\tau _{A,B,C}^{l,l'})\), we want

$$\begin{aligned} \#Aa+\#Bb+\#Cc\mapsto \min \end{aligned}$$

under the constraints (16). We know that this minimum is achieved in a corner of the feasible set. But, we see easily that not all inequalities in (16) could be equalities unless \(b=0\). Thus, at least one of abc must be zero and we obtain the minimal value as

$$\begin{aligned} \min \left\{ \#A l+\#Cl',(\#B+\#C)l+\#C l',\#A l+(\#A +\#B)l'\right\} \end{aligned}$$

A distinction of cases whether \(\#A\gtreqless \#B+\#C\) and \(\#C\gtreqless \#A+\#B\) gives us in any case one of the values as minimum. Thus, in any case, \( D_1(\tau _0,\tau _{A,B,C}^{l,l'})\) is a linear combination of l and \(l'\), i.e. some weighted \(\ell ^1\)-distance.

The computation of \( D_2(\tau _0,\tau _{A,B,C}^{l,l'})\) would mean solving the quadratic program

$$\begin{aligned} \#Aa^2+\#Bb^2+\#Cc^2\mapsto \min \end{aligned}$$

under the constraints (16). For this problem, we only know that the solution is the projection of the null vector onto the affine hyperspace determined by some face of the feasible set. This projection is linear in l and \(l'\). This means that \( \left( D_2(\tau _0,\tau _{A,B,C}^{l,l'})\right) ^2\) is the minimum of five quadratic functions in \(l,l'\). Since the algebra is rather tedious, we stop here now with the indication that this minimum is just a single quadratic function similar to the linear case before. A numerical test for several cardinalities and random lengths \(l,l'\) showed that the parallelogram equality is fulfilled in all considered situations (data not shown, see https://math-inf.uni-greifswald.de/fileadmin/uni-greifswald/fakultaet/mnf/mathinf/liebscher/phylodistpaper4.R). Thus, the local geometry under \(D_2\) seems to be euclidean. Note that the previous example showed that \(D_2\) is not a version of the geodesic distance from Billera et al. (2001) for unrooted trees.

7 Implementation and Numerical Examples

We did a small simulation study on a Intel\(\circledR \) i7-7700 3,60GHz PC running Ubuntu 16.04 to get some empirical insight extending our mathematical results.

The different metrics were implemented in R (R Core Team 2017) and form now the gromovlab package (Liebscher 2015). For the geodesic distance, the implementation by the package distory (Chakerian and Holmes 2017) was used. To root trees, the first taxon was marked as outgroup. The weighted matching distance was implemented using the package lpSolve (Berkelaar et al. 2015). Random (weighted and unweighted) trees were generated by the function rtree of the R package ape (Paradis et al. 2004). This function generates uniformly distributed binary trees with uniformly in [0, 1] distributed edge lengths in the weighted case. To avoid a potential bias, labels were randomly permuted afterwards. Random caterpillars were generated by random labelling of the caterpillar tree generated by the stree function of the package ape. The corresponding R-script can be downloaded from https://math-inf.uni-greifswald.de/fileadmin/uni-greifswald/fakultaet/mnf/mathinf/liebscher/phylodistpaper4.R.

Some testing showed in the \(\ell ^1\)-case best performance in terms of computing time for the dual simplex algorithm. The computing time for obtaining the distance between random trees of size \(n=100\) was around 0.1 s. This compares to the computing times of the geodesic distance and the weighted matching distance, see Fig. 2. Of course, computations of the Robinson–Foulds distance and the pathwise difference metrics are faster.

Fig. 2
figure 2

Boxplot of computing times for different metrics (logarithmic scale) for \(10^3\) random weighted X-trees with \(n=\#X=100\). From left: \(D_1\), \(D_2\), \(D^{pd}_\infty \), \(D^{pd}_1\), \(D^{pd}_2\), the geodesic, the Robinson–Foulds and the weighted matching distance. All times were rounded to milliseconds

We also compared the values of \(D_i\), \(i=1,2,\infty \) with the pathwise difference metrics [see (9)], the geodesic distance and the Robinson–Foulds metric, for random weighted binary X-trees with \(n=10\) leaves. The resulting scatterplots are presented in Fig. 3. One can observe correlations only among the different Gromov-type metrics and among the different pathwise difference metrics. There is not much correlation to the geodesic distance. This shows that our metrics differ essentially from the pathwise difference and the geodesic metrics. Further, it is not determined by any of those metrics in the sense of Coons and Rusinko (2016).

Similar pictures are found for random unweighted trees, see Fig. 4. Now there is a strong correlation to the weighted matching distance. Interestingly, \(D_1 \) turns out to be integer-valued now, see Fig. 5. That is surprising since the matrix corresponding to the linear program (8) is not totally unimodular in the sense of Hoffman and Kruskal (2010), it contains the \(3\times 3\) submatrix \( \begin{pmatrix} 1&{}1&{}0\\ 1&{}0&{}1\\ 0&{}1&{}1 \end{pmatrix}\) with determinant \(-2\). Note that for general integer-valued metrics, \(\tilde{D}_1\) assumes also half-integer values (https://math-inf.uni-greifswald.de/fileadmin/uni-greifswald/fakultaet/mnf/mathinf/liebscher/phylodistpaper4.R). The lower bound from Lemma 6 computes to \(\frac{(n-1)^2}{4}\approx 21\). Obviously, it is not sharp. Random caterpillars provide a similar distribution with only even values and more extreme to the right (data not shown) (https://math-inf.uni-greifswald.de/fileadmin/uni-greifswald/fakultaet/mnf/mathinf/liebscher/phylodistpaper4.R).

Fig. 3
figure 3

Comparison of different metrics for \(10^3\) random weighted X-trees with \(n=\#X=10\). From upper left: \(D_1\), \(D_2\), \( D^{pd}_\infty \), \(D^{pd}_1\), \(D_2^{pd}\), the geodesic, the Robinson–Foulds and the weighted matching distance

Fig. 4
figure 4

Comparison of different metrics for \(10^3\) random unweighted trees with \(n=10\). From upper left: \(D_1\), \(D_2\), \(D^{pd}_\infty \), \(D^{pd}_1\), \(D_2^{pd}\), the geodesic, the Robinson–Foulds and the weighted matching distance

Fig. 5
figure 5

Frequency tables of the \(D_1\) metric for \(10^3\) random unweighted trees with \(n=10\). The formal lower bound on the diameter of T(X) from Lemma 6 is added as dotted line

8 Discussion

We constructed three different well motivated metrics on the space M(X) of semimetrics on the taxon set X. This leads to at least two new efficiently computable metrics for comparing unrooted, but possibly weighted, phylogenetic X-trees. There is an obvious interpretation of the metrics by parsimonious consistent matching of the differences entailed by the two trees, see Example 2. Since we showed that NNI moves are small in these metrics compared to the whole space of binary trees, these metrics surely capture some biological similarity. We think this rather abstract approach to tree metrics is valuable and could generalise well. One direction could be the extension to rooted trees. We should then just measure the distance of the induced metrics on \(X\cup \left\{ root\right\} \). Another generalisation could focus phylogenetic networks.

In general, we follow Steel and Penny (1993) in arguing that there is no universal metric for phylogenetic trees which suits perfectly for all purposes. We think that every application has its own choice, and we added a further choice to this portfolio. Yet, we should discuss further properties of phylogenetic metrics to guide the users. Monotonicity as considered in Allen and Steel (2001), Lemma 2.2 is a start in this direction. Here we want to discuss some results of the present paper and possible extensions only.

It looks interesting to extend the metric to tree shapes, with allowing the labels to be permuted. But computation of the general Gromov–Hausdorff distance is NP-hard (Pardalos and Wolkowicz 1994), and the same result holds for tree-induced metrics (Agarwal et al. 2015).

One important topic which raised up already in Bogdanowicz and Giaro (2012), Lin et al. (2012), Gavryushkin and Drummond (2016), and Kendall and Colijn (2016) is the question how to weight the edges of the trees. We computed the influence of different splits on our metric in Example 3. If those weights do not fit the intention of the user, one could change the tree-induced metrics by rescaling the edges of the trees in an objective way. Using weighted \(\left||\cdot \right||_i\) norms can account for uneven taxon sampling or rooting the tree. The principle of the computations would remain the same. Note that we met already such weights in Examples 3 and 4. Further, also a Kantorovich–Wasserstein approach similar to Mémoli (2007) might be feasible if the weights of the taxa differ between the trees. Thus, our approach is natural, but can be well adjusted to the needs of applications.

We compared the new metrics with the NNI metric, the pathwise difference metrics, the Robinson–Foulds metric (see Example 3). Not many of the estimates are tight. So it would be valuable to get tight lower and upper bounds in these cases and for the quartet, SPR-, TBR-, maximum parsimony, weighted matching and geodesic metrics as well. It is important to know more about the 1-neighbourhoods on \(T_1^2(X)\), e.g. whether there are islands in the sense of Bogdanowicz and Giaro (2012). Our numerical study in Sect. 7 is still sparse.

We expect the diameter between two unweighted X-trees to be realised by caterpillar trees. The simulation result in Fig. 5 points into this direction. We would like to know why \(D_1\) takes integer values only on \(T_1^2(X)\).

The geometry induced by \(D_2\) needs to be explored. Is it locally fully euclidean? How do the geodesics look like?

Outside phylogenetics, there should be applications to other kinds of finite labelled metric spaces. At the moment, we are only aware of the papers of F.Memoli, e.g. Mémoli (2007), which deal with \(\ell ^p\)-type Gromov–Hausdorff metrics.

In spite of these many open questions, we are sure that this work is just the start of studying this interesting kind of metrics.