1 Introduction

Comparing tree-structured data such as HTML and XML data for web mining or DNA and glycan data for bioinformatics is one of the important tasks for data mining. The most famous distance measure between trees is the edit distance [2, 9, 14]. The edit distance is formulated as the minimum cost of edit operations, consisting of a substitution, a deletion and an insertion, applied to transform from a tree to another tree.

It is known that the edit distance is closely related to the notion of a Tai mapping (mapping, for short) [14], which is a one-to-one node correspondence between trees preserving ancestor (and sibling) relations. Note that the corresponding nodes with different labels through a mapping are regarded as nodes applied to substitution and the non-corresponding nodes to deletion in a tree or insertion in another tree. Then, the minimum cost of possible Tai mappings coincides with the edit distance [14].

Whereas the edit distance is the standard measure for comparing trees, it is too general for several applications. Therefore, more structurally sensitive distances of the edit distance are required for these applications. Such distances are formulated as the minimum cost of the variations of the Tai mapping such as a top-down mapping (Top) [3, 13], an LCA (least common ancestor) -preserving (or degree-2) mapping (Lca) [22], an accordant (or Lu’s) mapping (Acc) [9, 11], an isolated-subtree (or constrained) mapping (Ilst) [17, 19, 20], a less-constrained mapping (Less) [10], an alignable mapping (Aln) [9], a bottom-up mapping (Bot) [9, 15, 18], a segmental mapping (Sg) [7] and a top-down segmental mapping (TopSg) [7], respectively.

The above mappings provide a Tai mapping hierarchy illustrated in Fig. 1, which grows from left to right by adding new variations of the Tai mapping. Here, Iso denotes an isomorphism and it holds that “Aln = Less” [9] and “Top = TopSg” [7].

Fig. 1
figure 1

Tai mapping hierarchies introduced by Wang and Zhang [17] (left), Kuboyama [9] (center) and Kan et al. [7] (right)

The Tai mapping hierarchy in Fig. 1 just represents the inclusion relation of mappings but no other properties of mappings and, in particular, the right column in Fig. 1 (left) is sparser than the left column. Then, there arise a question whether or not there exists a unifying property between the variations of a Tai mapping.

In order to solve the above question, in this paper, we characterize the Tai mapping hierarchy as a common subforest between two trees consisting of pairs of nodes in the variations of a Tai mapping. Then, we focus on the connections of nodes and the arrangements of subtrees in a common subforest. Note that the largest common subforest or tree provides another similarity measure between trees [1, 5, 9, 16, 21].Footnote 1

First, we focus on the following three connections of nodes in a common subforest. A common embedded subforest is a common subforest by connecting nodes in a mapping according to ancestor-descendant relation. A common induced subforest is a common subforest induced by a set of nodes. A common complete subforest is a common induced subforest containing all the descendants of a set of nodes. Then, we can characterize the mappings in Tai, Aln, Ilst, Acc and Lca as common embedded subforests, the mappings in Sg and Top as common induced subforests and mappings in Bot as common complete subforests, respectively. The mappings characterized as common induced subforests require the operational condition that the deletion and the insertion are allowed to apply just roots or leaves and the mappings characterized as common complete subforests require the operational condition that the deletion and the insertion are allowed to apply just roots.

Next, we focus on the four arrangements of subtrees in a common subforest. A common subforest is non-twisting if there exists no triple of nodes in a common subforest such that the LCA of the first and the second nodes is an ancestor of the LCA of the second and third nodes in a tree iff the LCA of the second and the third nodes is an ancestor of the LCA of the first and second nodes in another tree. A common subforest is parallel if, for every triple of nodes in a common subforest, the LCA of the first and the second nodes is equal to the LCA of the first and the third nodes in a tree iff the LCA of the first and the second nodes is equal to the LCA of the first and the third nodes in another tree. A common subforest is a subtree if it is a tree. A common subtree is root-preserving if the root of the common subtree is equal to the roots of both trees. Then, we can characterize the mappings in Tai, Sg and Bot as an arbitrary subforest, the mappings in Aln as a non-twisting subforest, the mappings in Ilst and Acc as a parallel subforest, the mappings in Lca as a subtree and the mappings in Top and Iso as a root-preserving subtree, respectively.

As a result, the Tai mapping hierarchy in Fig. 1 is not complete for the connections of nodes and the arrangements of subtrees in a common subforest. In order to fill a gap in the Tai mapping hierarchy in Fig. 1, in this paper, we introduce new mappings by intersecting Sg and Bot in the right column in Fig. 1 with Aln, Ilst, Acc and Lca in the left column in Fig. 1.

As intersecting Sg, we introduce a segmental alignable mapping (SgAln), an isolated-subtree segmental mapping (IlstSg), an accordant segmental mapping (AccSg) and an LCA-preserving segmental mapping (LcaSg). As intersecting Bot, we introduce a bottom-up alignable mapping (BotAln), an isolated-subtree bottom-up mapping (IlstBot), an accordant bottom-up mapping (AccBot) and an LCA-preserving bottom-up mapping (LcaBot). Additionally, we introduce an LCA- and root-preserving mapping (LcaRt).

Hence, we provide a new Tai mapping hierarchy illustrated in Fig. 2, where the previous mappings in Fig. 1 are illustrated as the nodes enclosed by gray lines and the new mappings are illustrated as the nodes enclosed by black lines. In particular, we show that “AccSg = IlstSg” and “AccBot = IlstBot.”

Fig. 2
figure 2

A new Tai mapping hierarchy consisting of the mappings in Fig. 1 illustrated as the nodes enclosed by gray lines and the new mappings illustrated as the nodes enclosed by black lines

In the vertical direction in Fig. 2, the mappings in the left column (Tai, Aln, Ilst, Acc, Lca and LcaRt), those in the center column (Sg, SgAln, AccSg, LcaSg and Top) and those in the right column are characterized as common embedded, induced and complete subforests, respectively. In the horizontal direction in Fig. 2, on the other hand, the mappings in the top row (Tai, Sg and Bot), those in the second row (Aln, SgAln and BotAln), those in the third row (Ilst, Acc, AccSg and AccBot), those in the fourth row (Lca, LcaSg, LcaBot) and those in the last row (LcaRt, Top and Iso) are characterized as arbitrary subforests, non-twisting subforests, parallel subforests, subtrees and root-preserving subtrees, respectively.

Next, we introduce the variations of the edit distance as the minimum cost of all the possible mappings in SgAln, AccSg, LcaSg, BotAln, AccBot, LcaBot and LcaRt, where we call them a segmental alignment distance, an accordant segmental distance, an LCA-preserving segmental distance, a bottom-up alignment distance, an accordant bottom-up distance, an LCA-preserving bottom-up distance and an LCA- and root-preserving distance. Then, we show that no distances characterized as non-twisting subforests are metrics, whereas the other distances are metrics.

Finally, we summarize and investigate the time complexity of computing the variations of the edit distance illustrated in Table 1. Here, n is the number of nodes in a tree, m is the number of nodes in another tree (nm), D is the maximum degree of two trees and d is the minimum degree of two trees.

Table 1 The metricity of the variations τ𝖠(T 1, T 2) for ℳ𝖠(T 1, T 2) in Fig. 2 and time complexity of computing them

For ordered trees, we can compute the distances characterized as non-twisting subforests in O(n m D 2) time and those as other subforests or subtrees except the (standard) edit distance and the tree isomorphism in O(n m) time. For unordered trees, it is known that the problem of computing the distances characterized as arbitrary subforests is MAX SNP-hard [21] even if trees are binary [5, 18]. On the other hand, as in the case of alignment distance [6], the problems of computing the distances characterized as non-twisting subforests are MAX SNP-hard, whereas they are tractable if the degrees of trees are bounded by some constant. Furthermore, we can compute the distances characterized as parallel subforests, subtrees and root-preserving subtrees in O(n m d) time except the LCA-preserving bottom-up distance, which we can compute in O(n m) time.

2 Edit Distance and Tai Mapping

A tree T is a connected graph (V, E) without cycles, where V is the set of vertices and E is the set of edges and we denote V and E by V(T) and E(T). The size of T is |V| and denoted by |T|. We sometime denote vV(T) by vT. We denote an empty tree (, ) by . A rooted tree is a tree with one node r chosen as its root. We denote the root of a rooted tree T by r(T).

For each node v in a rooted tree with the root r, let UP r (v) be the unique path from v to r. The parent of v( ≠ r), which we denote by par(v), is its adjacent node on UP r (v) and the ancestors of v( ≠ r) are the nodes on UP r (v) − {v}. We denote the set of all ancestors of v by anc(v). We say that u is a child of v if v is the parent of u and u is a descendant of v if v is an ancestor of u. A leaf is a node having no children. We denote the set of all leaves in T by lv(T). A node neither a leaf nor a root is called an internal node. The degree of a node v, denoted by d(v), is the number of children of v. The degree of a rooted tree T, denoted by d(T), is the maximum number of d(v) for every vT.

We use the ancestor orders < and ≤ , that is, u < v if v is an ancestor of u and uv if u < v or u = v. We denote neither uv nor vu by u # v. We say that w is the least common ancestor (LCA, for abbreviation) of u and v, denoted by uv, if uw, vw and there exists no w′ such that w′ < w, uw′ and vw′. A (complete) subtree of T = (V, E)rooted by v, denoted by T[v], is a tree T′ = (V′, E′) such that r(T′) = v, V′ = {uVuv} and E′ = {(u, w) ∈ Eu, wV′}.

For nodes u, vT, u is to the left of v, denoted by uv, if pre(u) ≤ pre(v) for the preorder number pre and post(u) ≤ post(v) for the postorder number post. We say that a rooted tree is ordered if a left-to-right order among siblings is given; unordered otherwise. We say that a rooted tree is labeled if each node is assigned a symbol from a fixed finite alphabet Σ. For a node v, we denote the label of v by l(v), and sometimes identify v with l(v). We call a rooted labeled tree a tree simply.

A forest is a sequence [T 1, … , T n ] of trees. For a tree T and a node iT, T(i) is a forest obtained by deleting the root i in T[i]. We denote the number of trees in a forest F by ||F||, that is, ||F|| = n for F = [T 1, … , T n ].

In the following sections, we will use two trees T 1 and T 2 to obtain the distance between T 1 and T 2. Then, for nodes iT 1 and jT 2, suppose that the children of i and j are i 1, … , i s and j 1, … , j t , respectively. Also, we sometimes denote the forests T 1(i) = [T 1[i 1], … , T 1[i s ]] and T 2(j) = [T 2[j 1], … , T 2[j t ]] by F 1(i 1, i s ) and F 2(j 1, j t ), respectively.

Definition 1

(Edit operations 14) The edit operations of a tree T are defined as follows. See Fig. 3.

  1. 1.

    Substitution: Change the label of the node v in T.

  2. 2.

    Deletion:Footnote 2 Delete a node v in T with parent v′, making the children of v become the children of v′. The children are inserted in the place of v as a subsequence in the left-to-right order (for ordered trees) or a subset (for unordered trees) of the children of v′. In particular, if v is the root in T, then the result applying the deletion is a forest consisting of the children of the root.

  3. 3.

    Insertion: The complement of deletion. Insert a node v as a child of v′ in T making v the parent of a consecutive subsequence (for ordered trees) or a subset (for unordered trees) of the children of v′.

Fig. 3
figure 3

Edit operations for trees

Let ε ∉ Σ denote a special blank symbol and define Σ ε = Σ ∪ {ε}. Then, we represent each edit operation by (l 1l 2), where (l 1, l 2) ∈ (Σ ε × Σ ε − {(ε, ε)}). The operation is a substitution if l 1ε and l 2ε, a deletion if l 2 = ε, and an insertion if l 1 = ε. For nodes v and w, we also denote (l(v) ↦ l(w)) by (vw). We define a cost function γ : (Σ ε × Σ ε − {(ε, ε)}) ↦ R + on pairs of labels. We often constrain a cost function γ to be a metric, that is, γ(l 1, l 2) ≥ 0, γ(l 1, l 2) = 0 iff l 1 = l 2, γ(l 1, l 2) = γ(l 2, l 1) and γ(l 1, l 3) ≤ γ(l 1, l 2) + γ(l 2, l 3). We call the cost function that γ(l 1, l 2) = 1 if l 1l 2 a unit cost function.

Definition 2

(Edit distance 14) For a cost function γ, the cost of an edit operation e = l 1l 2 is given by γ(e) = γ(l 1, l 2). The cost of a sequence E = e 1, … , e k of edit operations is given by \(\gamma (E) = {\sum }_{i=1}^{k}\gamma (e_{i})\). Then, an edit distance τ Tai(T 1, T 2) between trees T 1 and T 2 is defined as follows:

$$\begin{array}{l} \tau_{\textsc{Tai}}(T_{1},T_{2}) =\min\left\{ \gamma(E)\left| \begin{array}{l} E\text{ is a sequence } \text{of edit operations}\\ \text{transforming }T_{1}\text{ to }T_{2} \end{array} \right.\right\}. \end{array} $$

Definition 3

(Tai mapping 14) Let T 1 and T 2 be trees. We say that a triple (M, T 1, T 2) is an ordered Tai mapping between T 1 and T 2 if MV(T 1) × V(T 2) and every pair (u 1, v 1) and (u 2, v 2) in M satisfies that (1) u 1 = u 2 iff v 1 = v 2 (one-to-one condition), (2) u 1u 2 iff v 1v 2 (ancestor condition) and (3) u 1u 2 iff v 1v 2 (sibling condition). Also we say that a triple (M, T 1, T 2) is an unordered Tai mapping if MV(T 1) × V(T 2) and every pair (u 1, v 1) and (u 2, v 2) in M satisfies (1) one-to-one condition and (2) ancestor condition. The set of all possible Tai mappings between T 1 and T 2 is denoted by \(\mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\). We will use M instead of (M, T 1, T 2) when there is no confusion, call both an ordered and an unordered Tai mapping a mapping.

In particular, we say that a mapping M is an inclusion mapping from T 1 to T 2 if, for every node vT 1, there exists a node wT 2 such that (v, w) ∈ M. In this case, we denote w by M(v).

Let M be a mapping between T 1 and T 2. Let I M and J M be the sets of nodes in T 1 and T 2 but not in M, that is, I M = {uT 1∣(u, v) ∉ M} and J M = {vT 2∣(u, v) ∉ M}. Then, the cost γ(M) of M is given as follows.

$$\gamma(M) =\sum\limits_{(u,v)\in M}\gamma(u,v)+ \sum\limits_{u\in I_{M}}\gamma(u,\varepsilon)+\sum\limits_{v\in J_{M}}\gamma(\varepsilon,v). $$

Theorem 1

(Tai 14) \(\tau _{\textsc {Tai}}(T_{1},T_{2}) = \min \{\gamma (M)\mid M\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\}\).

Finally, we introduce the notion of a subforest, a common subforest and their arrangement of subtrees illustrated in Fig. 2. Note that we ignore labels of nodes.

Definition 4

(Subforest, common subforest) Let T = (V, E) be a tree.

  1. 1.

    We say that F = (V′, E′) is an embedded subforest in T if V′ ⊆ V and, for every u, vV′, (u, v) ∈ E′ if u < v and there exists no wV′ such that u < w and w < v.

  2. 2.

    We say that F = (V′, E′) is an induced subforest in T if V′ ⊆ V and, for every u, vV′, (u, v) ∈ E′ if (u, v) ∈ E.

  3. 3.

    We say that F = (V′, E′) is a complete subforest in T if F is an induced subforest in T and, for every vV′, all the descendants of v is in V′.

Furthermore, for trees T 1 and T 2, we say that F is a common embedded (resp., induced, complete) subforest between T 1 and T 2 if F is an embedded (resp., induced, complete) subforest in both T 1 and T 2. We call the above three common subforests a common subforest if it is not necessary to distinguish them.

Definition 5

(Arrangement of subtrees in a common subforest) Let F be a common subforest between T 1 and T 2. For a node vF, we denote vT 1F and vT 2F by v 1 and v 2, respectively.

  1. 1.

    We say that F is twisting if there exist u, v, wF such that u 1v 1 < v 1w 1 in T 1 and v 2w 2 < u 2v 2 in T 2. Otherwise F is non-twisting.

  2. 2.

    We say that F is parallel if, for every u, v, wF, u 1v 1 = u 1w 1 in T 1 iff u 2v 2 = u 2w 2 in T 2.

  3. 3.

    We say that F is a subtree if F is a tree, that is, ||F|| = 1.

  4. 4.

    We say that F is a root-preserving subtree if F is a tree T such that r(T) = r(T 1) = r(T 2).

Example 1

Consider trees T 1 and T 2 in Fig. 4. Then, F 1, F 2 and F 3 illustrated in the rightmost column in Fig. 4 are the common embedded subforest, the common induced subforest and the common complete subforest between T 1 and T 2, respectively. Here, every v 1 in T 1 (resp., v 2 in T 2) for a node vF i (i = 1, 2, 3) is illustrated as gray nodes. Also F 1 is a root-preserving subtree and F 2 and F 3 are parallel subforests.

Fig. 4
figure 4

The common embedded subforest F 1, the common induced subforest F 2 and the common complete subforest F 3 between T 1 and T 2

3 Tai Mapping Hierarchy

In this section, we introduce the variations of a Tai mapping.

Definition 6

(Variations of mapping) Let T 1 and T 2 be trees and \(M\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\). We define M as M − {(r(T 1), r(T 2))}.

  1. 1.

    We say that M is a less-constrained mapping [10], denoted by \(M\in \mathcal {M}_{\textsc {Less}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{1}\sqcup u_{2}<u_{1}\sqcup u_{3}\Longrightarrow v_{2}\sqcup v_{3}=v_{1}\sqcup v_{3}\right). $$
  2. 2.

    We say that M is an isolated-subtree mapping [17] (or a constrained mapping [19]), denoted by \(M\in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{3}<u_{1}\sqcup u_{2}\iff v_{3}<v_{1}\sqcup v_{2}\right). $$
  3. 3.

    We say that M is an accordant mapping [9] (or a Lu’s mapping [11]), denoted by \(M\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{1}\sqcup u_{2}=u_{1}\sqcup u_{3} \iff v_{1}\sqcup v_{2}=v_{1}\sqcup v_{3} \right). $$
  4. 4.

    We say that M is an LCA-preserving mapping (or a degree-2 mapping [22]), denoted by \(M\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u_{1},v_{1}),(u_{2},v_{2})\in M^{-} \left( (u_{1}\sqcup u_{2},v_{1}\sqcup v_{2})\in M\right). $$
  5. 5.

    We say that M is an LCA- and root-preserving mapping, denoted by \(M\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\), if \(M\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\) and (r(T 1), r(T 2)) ∈ M.

  6. 6.

    We say that M is a top-down mapping [3, 13] (or a degree-1 mapping), denoted by \(M\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u,v)\in M^{-} \left( (\textit{par}(u),\textit{par}(v))\in M\right). $$
  7. 7.

    We say that M is a bottom-up mapping [9, 15, 18],Footnote 3 denoted by \(M\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u,v)\in M \left( \begin{array}{l} \forall u^{\prime}\in T_{1}[u] \exists v^{\prime}\in T_{2}[v]\left( (u^{\prime},v^{\prime})\in M\right)\\ \wedge \forall v^{\prime}\in T_{2}[v] \exists u^{\prime}\in T_{1}[u]\left( (u^{\prime},v^{\prime})\in M\right) \end{array} \right). $$
  8. 8.

    We say that M is a segmental mapping [7], denoted by \(M\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\), if M satisfies the following condition.

    $$\forall (u,v)\in M^{-} \left( \begin{array}{l} \exists (u^{\prime},v^{\prime})\in M \biggl(\left( u^{\prime}\in\textit{anc}(u)\right) \wedge \left( v^{\prime}\in\textit{anc}(v)\right) \biggr)\\ \qquad \Longrightarrow \left( (\textit{par}(u),\textit{par}(v))\in M\right) \end{array} \right). $$
  9. 9.

    We say that M is a top-down segmental mapping [7], denoted by \(M\in \mathcal {M}_{\textsc {TopSg}}(T_{1},T_{2})\), if M is a segmental mapping such that (r(T 1), r(T 2)) ∈ M.

  10. 10.

    We say that a mapping M is an alignable mapping [9], denoted by \(M\in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})\), if there exist a forest F, an inclusion mapping M 1 from T 1 to F and an inclusion mapping M 2 from T 2 to F satisfying that M 1(u) = M 2(v) for every (u, v) ∈ M.Footnote 4

  11. 11.

    \(\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2}) = \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\cap \mathcal {M}_{\mathtt {B}}(T_{1},T_{2})\) for mappings A and B. This paper deals with \(\mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})\), \(\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})\).

Example 2

Figure 5 illustrates mappings M i (1 ≤ i ≤ 8) [7, 9] such that \(M_{1}\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\) but \(M_{1}\not \in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\); \(M_{2}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\) but \(M_{2}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\); \(M_{3}\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})\) but \(M_{3}\not \in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\); \(M_{4}\in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})\) but \(M_{4}\not \in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})\); \(M_{5}\in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})\) but \(M_{5}\not \in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})\); \(M_{6}\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\) but \(M_{6}\not \in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})\); \(M_{7}\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\) but \(M_{7}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\) and \(M_{7}\not \in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\); \(M_{8}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\) but \(M_{8}\not \in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})\).

Fig. 5
figure 5

Mappings M i (1 ≤ i ≤ 8) in Example 2

In particular, for a segmental mapping \(M\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\), the formula in Definition 6.8 claims that, for (u, v),(u′, v′) ∈ M such that u′ ∈ anc(u) and v′ ∈ anc(v), there exist nodes u 1, … , u k T 1 and v 1, … , v k T 2 such that u 1 = u, u k = u′, v 1 = v, v k = v′, u i+1 = par(u i ), v i+1 = par(v i ) (1 ≤ ik − 1) and (u i , v i ) ∈ M (1 ≤ ik).

The following lemma has been shown in [7, 9, 17] or holds from Definition 6.

Lemma 1

The following statements hold.

  1. 1.

    \(\mathcal {M}_{\textsc {Iso}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2}) \subset \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Aln}} (T_{1},T_{2})\subset \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\) [9, 17].

  2. 2.

    \(\mathcal {M}_{\textsc {Iso}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2}) \subset \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\) [7].

  3. 3.

    \(\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) for mappings A and B.

  4. 4.

    If \(\mathcal {M}_{\mathtt {B}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {C}}(T_{1},T_{2})\), then \(\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {AC}}(T_{1},T_{2})\) for mappings A, B and C.

Theorem 2

The hierarchy illustrated in Fig. 2 holds, where the mapping \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) is denoted by its subscript A. In particular, the following statements hold. Here, S # Sdenotes that neither SSnor S′ ⊆ S for two sets S and S′.

  1. 1.

    \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})\)

  2. 2.

    \(\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})~\#~\mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\) , \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\)

  3. 3.

    \(\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\) , \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\).

  4. 4.

    \(\mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})~\#~\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\).

  5. 5.

    \(\mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\).

  6. 6.

    \(\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\).

  7. 7.

    \(\mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\).

Proof

  1. 1.

    We just show that \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\). We can show that \(\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})\) in the similar way. Since Lemma 1 implies that \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\), we just show the converse direction that \(\mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\).

    Suppose that \(M\in \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\) and let (u 1, v 1),(u 2, v 2),(u 3, v 3) ∈ M. Then, it holds that u 3 < u 1u 2v 3 < v 1v 2. Also suppose that \(M\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\). Consider the case that there exist (u 1, v 1),(u 2, v 2),(u 3, v 3) ∈ M such that u 1u 2 = u 1u 3 but v 1v 2v 1v 3.

    Suppose that u 1 # u 2, u 2 # u 3 and u 3 # u 1. Since M is a mapping, it holds that v 1 # v 2, v 2 # v 3 and v 3 # v 1. Since v 1v 2v 1v 3 and T 2 is a tree, it holds that either (1) v 1v 2 < v 1v 3 or (2) v 1v 3 < v 1v 2 in T 2. (See Fig. 6).

    For the case (1), it holds that v 3v 1v 2, which is a contradiction that u 3 < u 1u 2v 3 < v 1v 2. For the case (2), it holds that v 2v 1v 3, which is a contradiction that u 2 < u 1u 3v 2 < v 1v 3. (Note that this discussion holds for a non-segmental mapping).

    Suppose that u 1 # u 2, u 1 < u 3 and u 2 < u 3. Since u 1u 2 = u 1u 3, it holds that u 1u 2 = u 3 in T 1. On the other hand, since v 1v 2v 1v 3, it holds that v 1v 2 < v 1v 3 = v 3 in T 2. (See the case (3) in Fig. 6).

    Since M is a segmental mapping, there exists a node u′ ∈ T 1 such that (u′, v 1v 2) ∈ M. Since v 1v 2 < v 3, it holds that u′ < u 3 = u 1u 2. On the other hand, since v 1 < v 1v 2 and v 2 < v 1v 2, it holds that u 1 < u′ and u 2 < u′, that is, u 3 = u 1u 2u′, which is a contradiction. The above discussion also holds for the case that there exist (u 1, v 1),(u 2, v 2),(u 3, v 3) ∈ M such that v 1v 2 = v 1v 3 but u 1u 2u 1u 3.

    Hence, for every (u 1, v 1),(u 2, v 2),(u 3, v 3) ∈ M, it holds that u 1u 2 = u 1u 3v 1v 2 = v 1v 3, which implies that \(M\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\).

    In order to show the other statements, it is necessary to show the inclusion and the properness.

    For the inclusion, by Lemma 1, it is sufficient to show that \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\). Since (r(T 1), r(T 2)) ∈ M for every \(M\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\) and \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\), it holds that \(M\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\). Since \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {TopSg}}(T_{1},T_{2})\) [7] and \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\), it holds that \(\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\) by Lemma 1.4.

    On the other hand, we show the properness by using the mappings M i (1 ≤ i ≤ 9) in Fig. 7.

  2. 2.

    It holds that \(M_{1}\in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\) but \(M_{1}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\) and \(M_{1}\not \in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\); \(M_{2}\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\) but \(M_{2}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\) and \(M_{2}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\); \(M_{3}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\) but \(M_{3}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\) and \(M_{3}\not \in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\).

  3. 3.

    It holds that \(M_{4}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\) but \(M_{4}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\); \(M_{5}\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})\) but \(M_{5}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\); \(M_{6}\in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\) but \(M_{6}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\).

  4. 4.

    It holds that \(M_{3}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\) but \(M_{3}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\); \(M_{4}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\) but \(M_{4}\not \in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\).

  5. 5.

    It holds that \(M_{7}\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\) but \(M_{7}\not \in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\). Also it holds that \(M_{7}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\) but \(M_{7}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\).

  6. 6.

    It holds that \(M_{8}\in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\) but \(M_{8}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\).

  7. 7.

    It holds that \(M_{9}\in \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})\) but \(M_{9}\not \in \mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})\); \(M_{6}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\) but \(M_{6}\not \in \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})\); \(M_{7}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})\) but \(M_{7}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\). □

Fig. 6
figure 6

The cases (1), (2) and (3)

Fig. 7
figure 7

Mappings M i (1 ≤ i ≤ 9) in the proof of Theorem 2

Remark 1

We can characterize a common subforest between T 1 and T 2 by using a mapping M. Recall that I M and J M are the sets of nodes in T 1 and T 2 but not in M. Then, for T 1 = (V 1, E 1) and T 2 = (V 2, E 2), M induces a common subforest F = (V′, E′) between T 1 and T 2 such that V′ = V 1I M = V 2J M . We call such a common subforest a common subforest through M.

Since the segmental mapping requires to preserve the parent-children relationship as possible, a common subforest through the segmental mapping and its sub-mappings, that is, Sg, SgAln, AccSg, LcaSg and Top, is an induced subforest. These mappings require the operational condition that the deletion and the insertion are allowed to apply just leaves or roots. In particular, the mappings in Top require the operational condition that the deletion and the insertion are allowed to apply just leaves.

Since the bottom-up mapping requires to preserve the complete subtrees, a common subforest through the bottom-up mapping and its sub-mappings, that is, Bot, BotAln, AccBot, LcaBot and Iso, is a complete subforest. These mappings require the operational condition that the deletion and the insertion are allowed to apply just roots.

Remark 2

Since the LCA-preserving mapping provides the root as an LCA of all the nodes in a common subforest, the common subforest through the LCA-preserving mapping and its sub-mappings, that is, Lca, LcaSg, LcaBot, LcaRt, Top and Iso, is a subtree. In particular, the LCA- and root-preserving mapping contains a pair of roots in given trees, the common subforest through the LCA- and root-preserving mapping and its sub-mappings, that is, LcaRt, Top and Iso, is a root-preserving subtree.

Since the definition of the accordant mapping is same as a parallel common forest, the common subforest through the accordant mapping and its sub-mappings, that is, Acc, AccSg and AccBot, is parallel.

As similar as a twisting common subforest, we say that a mapping M between trees T 1 and T 2 has a twist if there exist pairs (u 1, v 1),(u 2, v 2),(u 3, v 3) ∈ M satisfying the following condition.

$$(u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2}). $$

We call such pairs (u 1, v 1),(u 2, v 2) and (u 3, v 3) a twist of M.

Since (v 1v 2v 2v 3) ≡ (v 1v 3 = v 2v 3) and ¬(v 2v 3 < v 1v 2) ≡ (v 1v 2v 2v 3) for every tree T and v 1, v 2, v 3T, we can transform the formula representing the nonexistence of twists in M logically as follows.

$$\begin{array}{ll} & \neg\exists (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \neg\left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( \neg (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \vee\neg (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow\neg (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow (v_{1}\sqcup v_{2}\le v_{2}\sqcup v_{3})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow (v_{1}\sqcup v_{3}= v_{2}\sqcup v_{3})\right). \end{array} $$

Hence, a mapping M is less-constrained if and only if M has no twists. As a result, the common subforest through the alignable mapping and its sub-mappings, that is, Aln, SgAln and BotAln, is non-twisting.

Definition 7

(Variations of edit distance) For every \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) in a mapping hierarchy in Fig. 2, the variation τ A(T 1, T 2) of the edit distance τ Tai is defined as the minimum cost of mappings in \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\):

\(\tau _{\mathtt {A}}(T_{1},T_{2}) = \min \{\gamma (M)\mid M\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\}\).

Remark 3

If the cost function is a metric, then it holds that τ LcaRt(T 1, T 2) = τ Lca(T 1, T 2) = τ Acc(T 1, T 2) [9]. On the other hand, consider the trees T 1 and T 2 in Fig. 8. Then, we obtain the minimum cost mapping \(M_{1}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\) and the minimum cost mapping \(M_{2}\in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\) under the unit cost function γ in Fig. 8. Hence, it holds that τ AccSg(T 1, T 2) = γ(M 1) = 4 < 6 = τ LcaSg(T 1, T 2) = γ(M 2).

Fig. 8
figure 8

Trees T 1 and T 2 and mappings M 1 and M 2 in Example 3

It is known that τ Tai, τ Ilst, τ Lca( = τ Acc), τ Top, τ Sg, and τ Bot are metrics, whereas τ Aln is not [7, 9]. In the remainder of this section, we investigate whether or not τ A is a metric for the newly introduced mappings A ∈ {SgAln, AccSg, LcaSg, BotAln, AccBot, LcaBot, LcaRt}.

Let \(M_{i}\in \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})\) for i = 1, 2. Then, we define the composition M 1M 2 of mappings M 1 and M 2 as follows.

$$M_{1}\circ M_{2} = \left\{ \begin{array}{l} (u,w)\\ ~\in V(T_{1})\times V(T_{3}) \end{array} \left| \begin{array}{l} \exists v\in V(T_{2})\\ ~\text{ s.t. } (u,v)\in M_{1}\text{ and }(v,w)\in M_{2} \end{array} \right.\right\}. $$

Lemma 2

(7, 9) Let \(M_{i}\in \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})\) for i = 1, 2, where:

$$\mathtt{A}\in\{\textsc{Tai},\textsc{Ilst}, \textsc{Acc},\textsc{Lca},\textsc{Top},\textsc{Sg},\textsc{Bot}\}. $$

Then, the following statements hold.

  1. 1.

    \(M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{3})\).

  2. 2.

    For a cost function γ that is a metric, it holds that γ(M 1 ∘M 2) ≤ γ(M 1) + γ(M 2).

Lemma 3

If both \(\mathcal {M}_{\mathtt {A}}\) and \(\mathcal {M}_{\mathtt {B}}\) satisfy the statements in Lemma 2, then so does \(\mathcal {M}_{\textsc {{\tt AB}}}\).

Proof

Suppose that \(M_{i}\in \mathcal {M}_{\mathtt {AB}}(T_{i},T_{i+1})\) for i = 1, 2. By the definition that \(\mathcal {M}_{\textsc {\tt AB}}(T_{i},T_{i+1}) = \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})\cap \mathcal {M}_{\mathtt {B}}(T_{i},T_{i+1})\) , it holds that \(M_{i}\in \mathcal {M}_{\textsc {\tt A}}(T_{i},T_{i+1})\) and \(M_{i}\in \mathcal {M}_{\mathtt {B}}(T_{i},T_{i+1})\) . By Lemma 2.1 for \(\mathcal {M}_{\mathtt {A}}\) and \(\mathcal {M}_{\mathtt {B}}\), it holds that \(M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{3})\) and \(M_{1}\circ M_{2}\in \mathcal {M}_{\textsc {\tt B}}(T_{1},T_{3})\) . Hence, it is obvious that \(M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {AB}}(T_{1},T_{3})\), so Lemma 2.1 holds for \(\mathcal {M}_{\mathtt {AB}}\). Since the proof of Lemma 2.2 just depends on Lemma 2.1, Lemma 2.2 also holds for \(\mathcal {M}_{\mathtt {AB}}\). □

Theorem 3

(Metricity) If a cost function is a metric, then so are τ AccSg , τ LcaSg , τ AccBot , τ LcaBot and τ LcaRt . On the other hand, neither τ SgAln nor τ BotAln is a metric even if a cost function is a metric.

Proof

First, we show that τ AccSg is a metric. To do this, it is sufficient to show that τ AccSg(T 1, T 2) ≥ 0, τ AccSg(T 1, T 2) = 0 iff T 1T 2, τ AccSg(T 1, T 2) = τ AccSg(T 2, T 1) and τ AccSg(T 1, T 3) ≤ τ AccSg(T 1, T 2) + τ AccSg(T 2, T 3). The first three statements follow from the definition of \(\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\). Let M i be the minimum cost accordant segmental mapping between T i and T i+1 (i = 1,2). By Lemma 2 and 3, it holds that τ AccSg(T 1, T 3) ≤ γ(M 1M 2) ≤ γ(M 1) + γ(M 2) = τ AccSg(T 1, T 2) + τ AccSg(T 2, T 3). By using the same proof, we can show that τ LcaSg, τ AccBot and τ LcaBot are metrics. Also it is obvious that τ LcaRt is a metric, because τ Lca is a metric.

On the other hand, in order to show that neither τ SgAln nor τ BotAln is a metric, consider the trees T 1, T 2 and T 3 in Fig. 9 and suppose that γ is a unit cost function. Then, we obtain the minimum cost mapping \(M_{12}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\), \(M_{13}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{3})\) and \(M_{23}\in \mathcal {M}_{\textsc {BotAln}}(T_{2},T_{3})\) under γ, respectively, in Fig. 9.

Fig. 9
figure 9

Trees T 1, T 2 and T 3 and mappings M 12, M 13 and M 23 in the proof of Theorem 3

Hence, it holds that 8 = γ(M 12) = τ BotAln(T 1, T 2) > τ BotAln(T 1, T 3) + τ BotAln(T 2, T 3) = γ(M 13) + γ(M 23) = 3 + 3 = 6. Since this statement holds for τ SgAln, neither τ SgAln nor τ BotAln satisfies the triangle inequality, neither is a metric. □

4 Time Complexity of Computing Distances

In this section, we investigate the time complexity of computing the variations of the edit distance for both ordered and unordered trees. Here, we identify a node with its postorder number. In the remainder of this paper, let n = |T 1|, m = |T 2|, D = max{d(T 1), d(T 2)} and d = min{d(T 1), d(T 2)}. For a mapping A, we denote the distance between ordered and unordered trees by \(\tau _{\mathtt {A}}^{o}\) and \(\tau _{\mathtt {A}}^{u}\), respectively, whereas the distance between ordered and unordered forests by \(\delta _{\mathtt {A}}^{o}\) and \(\delta _{\mathtt {A}}^{u}\), respectively.

4.1 Tractable Cases

In this section, we investigate the tractable cases of computing the variations of the edit distance.

Remark 4

According to [7], for every pair (i, j) ∈ T 1 × T 2 (1 ≤ in,1 ≤ jm), we can compute the top-down distance \(\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])\) between T 1[i] and T 2[j] in O(n m) time [7].

On the other hand, we can compute \(\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])\) by using the recurrences in Fig. 10 [18]. Here, BM is the maximum weighted bipartite matching with weight ω in a complete bipartite graph G M = (X, Y, E) such that X and Y are the sets of children of i in T 1 and j in T 2 and the weight of an edge (i′, j′) ∈ E is set to \(\tau _{\textsc {Top}}^{u}(T_{1}[i^{\prime }],\emptyset )+\tau _{\textsc {Top}}^{u}(\emptyset ,T_{2}[j^{\prime }])- \tau _{\textsc {Top}}^{u}(T_{1}[i^{\prime }],T_{2}[j^{\prime }])\) [18, 22]. Hence, we can store \(\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])\) for every (i, j) ∈ T 1 × T 2 in O(n m d) time.

Fig. 10
figure 10

The recurrences of computing \(\tau _{\textsc {Top}}^{u}(T_{1},T_{2})\) for unordered trees

Theorem 4

We can compute \(\tau _{\textsc {LcaRt}}^{o}(T_{1},T_{2})\) in O(nm) time and \(\tau _{\textsc {LcaRt}}^{u}(T_{1},T_{2})\) in O(nmd) time.

Proof

We can design the recurrences of computing τ LcaRt by adding

$$\tau_{\textsc{LcaRt}}(T_{1}[r_{1}],T_{2}[r_{2}]) = \delta_{\textsc{Lca}}(T_{1}(r_{1}),T_{2}(r_{2}))+\gamma(r_{1},r_{2}) $$

to the recurrences of computing τ Lca [18, 22] (for both ordered and unordered trees), where r i is the root of T i (i = 1, 2). Hence, we can compute \(\tau _{\textsc {LcaRt}}^{o}(T_{1},T_{2})\) in O(n m) time and \(\tau _{\textsc {LcaRt}}^{u}(T_{1},T_{2})\) in O(n m d) time. □

Theorem 5

We can compute \(\tau _{\textsc {LcaSg}}^{o}(T_{1},T_{2})\) in O(nm) time and \(\tau _{\textsc {LcaSg}}^{u}(T_{1},T_{2})\) in O(nmd) time.

Proof

For every pair (i, j) ∈ T 1 × T 2, we define the distance d(T 1, i, T 2, j) between T 1 and T 2 based on \(\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])\) as follows.

$$d(T_{1},i,T_{2},j) = \tau_{\textsc{Top}}^{o}(T_{1}[i],T_{2}[j]) +\sum\limits_{i^{\prime}\in T_{1}-T_{1}[i]}\gamma(i^{\prime},\varepsilon) +\sum\limits_{j^{\prime}\in T_{2}-T_{2}[j]}\gamma(\varepsilon, j^{\prime}). $$

By Remark 4, it holds that

$$\tau_{\textsc{LcaSg}}^{o}(T_{1},T_{2}) =\min\{ d(T_{1},i,T_{2},j) \mid 1\le i\le n,1\le j\le m \}, $$

which we can compute in O(n m) time. Also, by Remark 4, we can replace \(\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])\) in d(T 1, i, T 2, j) with \(\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])\) for computing \(\tau _{\textsc {LcaSg}}^{u}\), which runs in O(n m d) time. □

Theorem 6

We can compute \(\tau _{\textsc {AccSg}}^{o}(T_{1},T_{2})\) in O(nm) time and \(\tau _{\textsc {SgAln}}^{o}(T_{1},T_{2})\) in O(nmD 2 ) time.

Proof

Consider the recurrences of computing \(\tau _{\textsc {AccSg}}^{o}\) and \(\tau _{\textsc {SgAln}}^{o}\) in Fig. 11. The difference between the recurrences of computing \(\tau _{\textsc {AccSg}}^{o}\) and those of \(\tau _{\textsc {Acc}}^{o}\) [9] is the first two formulas in \(\tau _{\textsc {AccSg}}^{o}\), and the difference between the recurrences of computing \(\tau _{\textsc {SgAln}}^{o}\) and those of \(\tau _{\textsc {Aln}}^{o}\) [6] is the first formula in \(\tau _{\textsc {SgAln}}^{o}\), respectively. The first formula in \(\tau _{\textsc {AccSg}}^{o}\) and \(\tau _{\textsc {SgAln}}^{o}\) computes the distance for segmentations of (i, j) ∈ T 1 × T 2. In the second formula in \(\tau _{\textsc {AccSg}}^{o}\), we replace γ(i, j) in \(\tau _{\textsc {Acc}}^{o}\) with γ(i, ε) + γ(ε, j). Since we can compute them in constant time (after computing \(\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])\) in advance according to Remark 4) and by the same discussion of [9] and [6], the statement holds. □

Fig. 11
figure 11

The recurrences of computing \(\tau _{\textsc {AccSg}}^{o}(T_{1},T_{2})\) and \(\tau _{\textsc {SgAln}}^{o}(T_{1},T_{2})\) for ordered trees

Theorem 7

We can compute \(\tau _{\textsc {AccSg}}^{u}(T_{1},T_{2})\) in O(nmd) time.

Proof

We can compute \(\tau _{\textsc {AccSg}}^{u}(T_{1},T_{2})\) in O(n m d) time by using the recurrences in Fig. 12 after storing every \(\tau _{\textsc {Top}}^{u}(T_{1}(i),T_{2}(j))\) ((i, j) ∈ T 1 × T 2) in O(n m d) time according to the recurrences in Fig. 10 of Remark 4. □

Fig. 12
figure 12

The recurrences of computing \(\tau _{\textsc {AccSg}}^{u}(T_{1},T_{2})\) for unordered trees

Next, we investigate the variations of the bottom-up distance τ Bot.

Remark

We say that two trees T 1 and T 2 are ordered (resp., unordered) label-free isomorphic, denoted by \(T_{1}{\equiv _{l}^{o}} T_{2}\) (resp., \(T_{1}{\equiv _{l}^{u}} T_{2}\)), if there exists a mapping \(M\in \mathcal {M}_{\textsc {Tai}}^{o}(T_{1},T_{2})\) (resp., \(M\in \mathcal {M}_{\textsc {Tai}}^{u}(T_{1},T_{2})\)), called a label-free isomorphism, such that I M = J M = . Also we define diff r(T 1, T 2) (r ∈ {o, u}) as follows, where M is a label-free isomorphism between T 1 and T 2.

$$\begin{array}{rcl} \textit{diff}^{r}(T_{1},T_{2}) & = & \left\{ \begin{array}{l@{~~~}l} \sum\limits_{(i,j)\in M}\gamma(i,j) & \text{if }T_{1}{\equiv_{l}^{r}} T_{2},\\ \sum\limits_{i\in T_{1}}\gamma(i,\varepsilon)+{\sum}_{j\in T_{2}}\gamma(\varepsilon,j) & \text{if }T_{1}{\not\equiv_{l}^{r}} T_{2}. \end{array} \right. \end{array} $$

We can check \(T_{1}{\equiv _{l}^{r}} T_{2}\) and compute diff r(T 1, T 2) in O(n + m) time [15].

Theorem 8

We can compute \(\tau _{\textsc {LcaBot}}^{o}(T_{1},T_{2})\) and \(\tau _{\textsc {LcaBot}}^{u}(T_{1},T_{2})\) in O(nm) time.

Proof

For r ∈ {o, u}, by replacing \(\tau _{\textsc {Top}}^{r}(T_{1}[i],T_{2}[j])\) in d(T 1, i, T 2, j) in Theorem 5 with diff r(T 1[i], T 2[j]) and by Remark 5, we can compute \(\tau _{\textsc {LcaBot}}^{r}(T_{1},T_{2})\) in O(n m) time. □

Theorem 9

We can compute \(\tau _{\textsc {AccBot}}^{o}(T_{1},T_{2})\) in O(nm) time and \(\tau _{\textsc {BotAln}}^{o}(T_{1},T_{2})\) in O(nmD 2 ) time.

Proof

Remark 5 implies that we can compute \(\tau _{\textsc {AccBot}}^{o}\) and \(\tau _{\textsc {BotAln}}^{o}\) by replacing \(\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])\) in the recurrences of computing \(\tau _{\textsc {AccSg}}^{o}\) and \(\tau _{\textsc {SgAln}}^{o}\) in Fig. 11 with diff o(T 1[i], T 2[j]) in O(n m) time and in O(n m D 2) time, respectively. □

Theorem 10

We can compute \(\tau _{\textsc {AccBot}}^{u}(T_{1},T_{2})\) in O(nmd) time.

Proof

We extend the function diff u for unordered trees in Remark 5 to a function fdiff u for forests. Let F 1 = [S 1, … , S k ] and F 2 = [T 1, … , T l ] be forests. First, for X = {1, … , k} and Y = {1, … , l}, we construct a weighted bipartite graph G = (X, Y, E) such that (i, j) ∈ E if and only if \(S_{i}{\equiv _{l}^{u}} T_{j}\) and the weight ω((i, j)) = |S i |+|T j |−diff u(S i , T j ). Also, let ⊆ X × Y be the maximum weighted bipartite matching of G, \(\mathtt {BM}_{X}^{-}=\{i\in X\mid (i,j)\not \in \mathtt {BM}\}\) and \({\tt BM}_{Y}^{-}=\{j\in Y\mid (i,j)\not \in \mathtt {BM}\}\). Then, we define fdiff u(F 1, F 2) as follows.

$$\textit{fdiff}^{u}(F_{1},F_{2}) = \sum\limits_{(i,j)\in \mathtt{BM}}\textit{diff}^{u}(S_{i},T_{j}) +\sum\limits_{i\in {\mathtt{BM}_{X}^{-}}}|S_{i}| +\sum\limits_{j\in {\mathtt{BM}_{X}^{-}}}|T_{j}|. $$

It is obvious that fdiff u(F 1, F 2) computes the smallest value when selecting pairs of label-free isomorphic trees in F 1 and F 2, which is \(\delta _{\textsc {AccBot}}^{u}(F_{1},F_{2})\). Hence, by replacing \(\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])\) and \(\delta _{\textsc {Top}}^{u}(T_{1}(i),T_{2}(j))\) in Fig. 12 with diff u(T 1[i], T 2[j]) and fdiff u(T 1(i), T 2(j)), we can compute \(\tau _{\textsc {AccBot}}^{u}(T_{1},T_{2})\) in O(n m d) time. □

4.2 Intractable Cases

Suppose that π1 and π2 be two optimization problems. Then, we say that π1 L c-reduces to π2 if there exist polynomial-time algorithms f, g and constants α, β > 0 satisfying the following statements for an instance I of π1:

  1. 1.

    opt(f(I)) ≤ αopt(I).

  2. 2.

    For a solution of f(I) with weight s 2, the algorithm g produces in polynomial time a solution of I with weight s 1 such that |s 1opt(I)| ≤ β⋅|s 2opt(f(I))|.

If π1 L-reduces π2, and π2 can be approximated in polynomial time within a factor 1 + ε, then π1 can be approximated within the factor 1 + α β ε. If π2 has a polynomial time approximation scheme (PTAS), then so does π1 [12].

A problem is MAX SNP-hard if every problem in MAX SNP can be L-reduced to it. Since the composition of two L-reductions is also an L-reduction, a problem is MAX SNP-hard if a MAX SNP-hard problem can be L-reduced it. It is known that if any MAX SNP-hard problem has a PTAS, then P=NP. Hence, it is very unlikely for a MAX SNP-hard problem to have a PTAS [12].

Concerned with this paper, Zhang and Jiang [21] have first shown that the problem of computing \(\tau _{\textsc {Tai}}^{u}(T_{1},T_{2})\) is MAX SNP-hard. Akutsu et al. [1] and Hirata et al. [5] have shown that this problem is also MAX SNP-hard even if the height of T 1 and T 2 is at most 2 and T 1 and T 2 are binary trees, respectively. Furthermore, Yamamoto et al. [18] have shown that the problem of computing \(\tau _{\textsc {Bot}}^{u}(T_{1},T_{2})\) is MAX SNP-hard even if the degrees of T 1 and T 2 are binary, which implies the same MAX SNP-hardness of the problem of computing \(\tau _{\textsc {Sg}}^{u}(T_{1},T_{2})\).

Note that we cannot apply this proof to showing that the problems of computing \(\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})\) and \(\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})\) are MAX SNP-hard, with preserving alignable mappings. Hence, in this paper, we give the similar proof of [18] to show that the problems of computing \(\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})\) and \(\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})\) are MAX SNP-hard, by using L-reduction from MAX 3SC-3:

Maximum Bounded Covering by 3-Sets (MAX 3SC-3) [8]:

Instance: A finite set S and a collection \(\mathcal {C}\) of 3-element subsets of S, where every element of S occurs in at most three of the subsets in \(\mathcal {C}\).

Solution: The largest covering of S, where a covering is a collection of mutually disjoint sets in \(\mathcal {C}\).

Let S = {s 1, … , s m }, \(\mathcal {C}=\{C_{1},\ldots ,C_{n}\}\) and C i = {s i1, s i2, s i3} be an instance of MAX 3SC-3, where s i1, s i2, s i3S. Also, let \(\mathcal {C}^{*}\) be the largest covering of S. We assume that the cost function γ is a unit cost function.

We construct three trees \(\hat {C}_{i}\) for C i (1 ≤ in), \(\hat {s}_{j}\) for s j (1 ≤ jm) and \(\hat {D}\) illustrated in Fig. 13 (left), where λ is a new label. Then, we construct two trees T 1 and T 2 illustrated in Fig. 13 (right). We call the transformation from an instance of MAX 3SC-3 to trees T 1 and T 2 f. Then, we can show the following lemma as same as [18].

Fig. 13
figure 13

Trees \(\hat {C}_{i}\) (1 ≤ in), \(\hat {s}_{j}\) (1 ≤ jm) and \(\hat {D}\) (left) and trees T 1 and T 2 (right)

Lemma 4

For A ∈ {SgAln, BotAln}, let M be the minimum cost mapping in \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) Then, for every \(\hat {C_{i}}\) in T 1 , M maps (a) all of the three subtree \(\hat {s_{i1}}\) , \(\hat {s_{i2}}\) and \(\hat {s_{i3}}\) in \(\hat {C_{i}}\) to the same three subtrees in T 2 or (b) \(\hat {C_{i}}\) to some dummy subtree \(\hat {D}\) in T 2.

Remark 6

Lemma 4 does not hold for a mapping in \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) such that ∈ {Ilst, Acc, AccSg, AccBot}. For i and i′, suppose that \(\hat {C_{i}}\) satisfies (a) and \(\hat {C_{i^{\prime }}}\) satisfies (b) in Lemma 4. Let v i j (resp., w i j ) denotes the leaf in T 1 (resp., T 2) labeled by \(\hat {s_{ij}}\) and w λ denotes some leaf in T 2 labeled by λ.

Then, \(M=\{ (v_{i1},w_{i1}), (v_{i2},w_{i2}), (v_{i^{\prime }1},w_{\lambda })\}\) is corresponding to the part of the mapping between T 1 and T 2 in Lemma 4. However, it holds that w λ < w i1w i2 but \(v_{i^{\prime }1}\not <v_{i1}\sqcup v_{i2}\). Hence, M is not an isolated-subtree mapping.

Lemma 5

For A ∈ {SgAln, BotAln}, \(\tau _{\mathtt {A}}^{u}(T_{1},T_{2}) = 9n+3m-|\mathcal {C}^{*}|+2\).

Proof

Let M be the minimum cost mapping in \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\), and \(k=|\mathcal {C}^{*}|\).

By Lemma 4, for every \(C_{i}\in \mathcal {C}^{*}\), M maps the three subtrees \(\hat {s}_{i1}\), \(\hat {s}_{i2}\) and \(\hat {s}_{i3}\) in \(\hat {C}_{i}\) to the same three subtrees in T 2. Since the cost of a mapping between \(\hat {C}_{i}\) in T 1 and \(\hat {s}_{i1}\), \(\hat {s}_{i2}\) and \(\hat {s}_{i3}\) in T 2 is 4 (which is the number of non-mapped nodes in \(\hat {C}_{i}\) in T 1) and \(|\mathcal {C}^{*}|=k\), the cost of M concerned with \(\mathcal {C}^{*}\) is 4k. Also, by Lemma 4, for every \(C_{i}\in \mathcal {C}-\mathcal {C}^{*}\), M maps \(\hat {C}_{i}\) to some dummy tree \(\hat {D}\) in T 2. Since the cost of a mapping between \(\hat {C}_{i}\) in T 1 and \(\hat {D}\) in T 2 is 9 (which is the number of pairs of nodes labeled by s i j (j = 1,2,3) in T 1 and λ in T 2) and \(|\mathcal {C}-\mathcal {C}^{*}|=n-k\), the cost of M concerned with \(\mathcal {C}-\mathcal {C}^{*}\) is 9(nk). Since M does not touch m − 3k subtrees \(\hat {s}_{j}\) of s j in T 2, n − (nk) = k dummy subtrees \(\hat {D}\) in T 2 and 2 roots labeled by r, the concerned cost of M is 3(m − 3k)+13k+2.

Hence, it holds that \(\tau _{\mathtt {A}}^{u}(T_{1},T_{2}) = \gamma (M) = 4k+9(n-k)+3(m-3k)+13k+2= 9n+3m-k+2\). □

Theorem 11

For A ∈ {SgAln, BotAln}, the problem of computing \(\tau _{\mathtt {A}}^{u}(T_{1},T_{2})\) is MAX SNP-hard.

Proof

Consider the algorithm g to transform from a mapping \(M\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) between T 1 and T 2 to a covering \(\mathcal {C}^{\prime }\) of \(\mathcal {C}\) as follows:

If M maps \(\hat {s}_{i1}\), \(\hat {s}_{i2}\) and \(\hat {s}_{i3}\) of \(\hat {C}_{i}\) in T 1 to the same three subtrees in T 2, then add C i to \(\mathcal {C}^{\prime }\).

It holds that n/7 ≤ opt(I) from the proof of Theorem 7 in [21]. Then, by Lemma 5 and since m ≤ 3n and opt(I) ≥ 1, the following inequalities hold.

$$\begin{array}{rcl} \textit{opt}(f(I)) &= &9n+3m-\textit{opt}(I)+2 \le 18n-\textit{opt}(I)+2 \le 127\cdot\textit{opt}(I),\\ s_{2}-\textit{opt}(f(I)) &= &\gamma(M)-\tau_{\mathtt{A}}^{u}(T_{1},T_{2})\\ & \ge & 9n+3m-|\mathcal{C}^{\prime}|+2 -(9n+3m-\textit{opt}(I)+2)\\ & = & \textit{opt}(I)-|\mathcal{C}^{\prime}| = \textit{opt}(I)-s_{1}. \end{array} $$

Hence, (f, g) is an L-reduction from MAX 3SC-3 to this problem. □

Whereas the problem of computing \(\tau _{\textsc {Aln}}^{u}(T_{1},T_{2})\) is MAX-SNP hard, it is tractable if the degrees of trees are bounded by some constant [6]. As same as \(\tau _{\textsc {Aln}}^{u}(T_{1},T_{2})\), the following theorem also holds.

Theorem 12

The problems of computing \(\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})\) and \(\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})\) are tractable if the degrees of T 1 and T 2 are bounded by some constant.

Proof

We can design the algorithm to compute \(\tau _{\textsc {SgAln}}^{u}\) by replacing \(\tau _{\textsc {Top}}^{o}\) in \(\tau _{\textsc {SgAln}}^{o}\) in Fig. 11 with \(\tau _{\textsc {Top}}^{u}\) in Fig. 10, and by improving the recurrences of computing \(\delta _{\textsc {SgAln}}^{o}\) in Fig. 11 to the recurrences in Fig. 14, which is same as the improvement of the recurrences from computing \(\delta _{\textsc {Aln}}^{o}\) to computing \(\delta _{\textsc {Aln}}^{u}\) with bounded degrees [6], where AT 1(i) and BT 2(j). Since the degrees of T 1 and T 2 are bounded by some constant, the number of combinations of A and B are bounded, so \(\delta _{\textsc {SgAln}}^{u}(A,B)\) can be computed in polynomial time.

Fig. 14
figure 14

The recurrences of computing \(\delta _{\textsc {SgAln}}^{u}\) for unordered trees with bounded degrees

Furthermore, by replacing diff o in the recurrences of computing \(\tau _{\textsc {BotAln}}^{o}\) (cf., Theorem 9) with diff u and by using the same discussion as above, we can design the recurrences of computing \(\delta _{\textsc {BotAln}}^{u}\) in polynomial time. □

5 Conclusion

In this paper, we have characterized a Tai mapping hierarchy as several common subforests, that is, as an embedded subforest, an induced subforest and a complete subforest by focusing on the connections of nodes in a common subforest and as a non-twisting subforest, a parallel subforest, a subtree and a root-preserving subtree by focusing on the arrangements of subtrees in a common subforest. Then, we have introduced new mappings into the Tai mapping hierarchy illustrated in Fig. 2.

Next, we have investigated the metricity of the variations of the edit distance as the minimum cost of the above mappings and the time complexity of computing them. We summarize the results as Table 1 in Section 1. Also Figs. 15 and 16 illustrate the relationship between the Tai mapping hierarchy and time complexity of computing τ(T 1, T 2) for \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\) as diagram forms.

Fig. 15
figure 15

The time complexity of computing \(\tau _{\mathtt {A}}^{o}(T_{1},T_{2})\) for \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\). Here, the nodes enclosed by black solid lines, black dashed lines, black dotted lines and gray solid lines illustrate that the problem of computing \(\tau _{\mathtt {A}}^{o}(T_{1},T_{2})\) is \(O(nm^{2}(1+\log \frac {n}{m}))\) time, O(n m D 2) time, O(n m) time and O(n + m) time, respectively

Fig. 16
figure 16

The time complexity of computing \(\tau _{\mathtt {A}}^{u}(T_{1},T_{2})\) for \(\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\). Here, the nodes enclosed by black solid lines, black dashed lines, black dotted lines, gray solid lines and gray dashed lines illustrate that the problem of computing \(\tau _{\mathtt {A}}^{o}(T_{1},T_{2})\) is MAX SNP-hard even if T 1 and T 2 are binary, MAX SNP-hard but tractable if the degrees of T 1 and T 2 are bounded, O(n m d) time, O(n m) time and O(n + m) time, respectively