Tai Mapping Hierarchy for Rooted Labeled Trees Through Common Subforest

Yoshino, Takuya; Hirata, Kouichi

doi:10.1007/s00224-016-9705-1

Tai Mapping Hierarchy for Rooted Labeled Trees Through Common Subforest

Published: 02 September 2016

Volume 60, pages 759–783, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Theory of Computing Systems Aims and scope Submit manuscript

Tai Mapping Hierarchy for Rooted Labeled Trees Through Common Subforest

Download PDF

Takuya Yoshino¹ &
Kouichi Hirata¹

232 Accesses
9 Citations
Explore all metrics

Abstract

A Tai mapping between two rooted labeled trees (trees, for short) is a one-to-one node correspondence preserving ancestors and siblings (if trees are ordered). The variations of the Tai mapping are known to provide a hierarchy, called a Tai mapping hierarchy. In this paper, we characterize the Tai mapping hierarchy as a common subforest by focusing on the connections of nodes and the arrangements of subtrees in a common subforest. Then, we fill a gap in the Tai mapping hierarchy by introducing several new variations. Furthermore, we summarize and investigate the time complexity of computing the variations of the edit distance as the minimum cost of the variations of the Tai mapping.

Earth Mover’s Distance Between Rooted Labeled Unordered Trees Formulated from Complete Subtrees

Anchored Alignment Problem for Rooted Labeled Trees

Sakura: A Flexible Coding for Tree Hashing

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Comparing tree-structured data such as HTML and XML data for web mining or DNA and glycan data for bioinformatics is one of the important tasks for data mining. The most famous distance measure between trees is the edit distance [2, 9, 14]. The edit distance is formulated as the minimum cost of edit operations, consisting of a substitution, a deletion and an insertion, applied to transform from a tree to another tree.

It is known that the edit distance is closely related to the notion of a Tai mapping (mapping, for short) [14], which is a one-to-one node correspondence between trees preserving ancestor (and sibling) relations. Note that the corresponding nodes with different labels through a mapping are regarded as nodes applied to substitution and the non-corresponding nodes to deletion in a tree or insertion in another tree. Then, the minimum cost of possible Tai mappings coincides with the edit distance [14].

Whereas the edit distance is the standard measure for comparing trees, it is too general for several applications. Therefore, more structurally sensitive distances of the edit distance are required for these applications. Such distances are formulated as the minimum cost of the variations of the Tai mapping such as a top-down mapping (Top) [3, 13], an LCA (least common ancestor) -preserving (or degree-2) mapping (Lca) [22], an accordant (or Lu’s) mapping (Acc) [9, 11], an isolated-subtree (or constrained) mapping (Ilst) [17, 19, 20], a less-constrained mapping (Less) [10], an alignable mapping (Aln) [9], a bottom-up mapping (Bot) [9, 15, 18], a segmental mapping (Sg) [7] and a top-down segmental mapping (TopSg) [7], respectively.

The above mappings provide a Tai mapping hierarchy illustrated in Fig. 1, which grows from left to right by adding new variations of the Tai mapping. Here, Iso denotes an isomorphism and it holds that “Aln = Less” [9] and “Top = TopSg” [7].

The Tai mapping hierarchy in Fig. 1 just represents the inclusion relation of mappings but no other properties of mappings and, in particular, the right column in Fig. 1 (left) is sparser than the left column. Then, there arise a question whether or not there exists a unifying property between the variations of a Tai mapping.

In order to solve the above question, in this paper, we characterize the Tai mapping hierarchy as a common subforest between two trees consisting of pairs of nodes in the variations of a Tai mapping. Then, we focus on the connections of nodes and the arrangements of subtrees in a common subforest. Note that the largest common subforest or tree provides another similarity measure between trees [1, 5, 9, 16, 21].^{Footnote 1}

First, we focus on the following three connections of nodes in a common subforest. A common embedded subforest is a common subforest by connecting nodes in a mapping according to ancestor-descendant relation. A common induced subforest is a common subforest induced by a set of nodes. A common complete subforest is a common induced subforest containing all the descendants of a set of nodes. Then, we can characterize the mappings in Tai, Aln, Ilst, Acc and Lca as common embedded subforests, the mappings in Sg and Top as common induced subforests and mappings in Bot as common complete subforests, respectively. The mappings characterized as common induced subforests require the operational condition that the deletion and the insertion are allowed to apply just roots or leaves and the mappings characterized as common complete subforests require the operational condition that the deletion and the insertion are allowed to apply just roots.

Next, we focus on the four arrangements of subtrees in a common subforest. A common subforest is non-twisting if there exists no triple of nodes in a common subforest such that the LCA of the first and the second nodes is an ancestor of the LCA of the second and third nodes in a tree iff the LCA of the second and the third nodes is an ancestor of the LCA of the first and second nodes in another tree. A common subforest is parallel if, for every triple of nodes in a common subforest, the LCA of the first and the second nodes is equal to the LCA of the first and the third nodes in a tree iff the LCA of the first and the second nodes is equal to the LCA of the first and the third nodes in another tree. A common subforest is a subtree if it is a tree. A common subtree is root-preserving if the root of the common subtree is equal to the roots of both trees. Then, we can characterize the mappings in Tai, Sg and Bot as an arbitrary subforest, the mappings in Aln as a non-twisting subforest, the mappings in Ilst and Acc as a parallel subforest, the mappings in Lca as a subtree and the mappings in Top and Iso as a root-preserving subtree, respectively.

As a result, the Tai mapping hierarchy in Fig. 1 is not complete for the connections of nodes and the arrangements of subtrees in a common subforest. In order to fill a gap in the Tai mapping hierarchy in Fig. 1, in this paper, we introduce new mappings by intersecting Sg and Bot in the right column in Fig. 1 with Aln, Ilst, Acc and Lca in the left column in Fig. 1.

As intersecting Sg, we introduce a segmental alignable mapping (SgAln), an isolated-subtree segmental mapping (IlstSg), an accordant segmental mapping (AccSg) and an LCA-preserving segmental mapping (LcaSg). As intersecting Bot, we introduce a bottom-up alignable mapping (BotAln), an isolated-subtree bottom-up mapping (IlstBot), an accordant bottom-up mapping (AccBot) and an LCA-preserving bottom-up mapping (LcaBot). Additionally, we introduce an LCA- and root-preserving mapping (LcaRt).

Hence, we provide a new Tai mapping hierarchy illustrated in Fig. 2, where the previous mappings in Fig. 1 are illustrated as the nodes enclosed by gray lines and the new mappings are illustrated as the nodes enclosed by black lines. In particular, we show that “AccSg = IlstSg” and “AccBot = IlstBot.”

In the vertical direction in Fig. 2, the mappings in the left column (Tai, Aln, Ilst, Acc, Lca and LcaRt), those in the center column (Sg, SgAln, AccSg, LcaSg and Top) and those in the right column are characterized as common embedded, induced and complete subforests, respectively. In the horizontal direction in Fig. 2, on the other hand, the mappings in the top row (Tai, Sg and Bot), those in the second row (Aln, SgAln and BotAln), those in the third row (Ilst, Acc, AccSg and AccBot), those in the fourth row (Lca, LcaSg, LcaBot) and those in the last row (LcaRt, Top and Iso) are characterized as arbitrary subforests, non-twisting subforests, parallel subforests, subtrees and root-preserving subtrees, respectively.

Next, we introduce the variations of the edit distance as the minimum cost of all the possible mappings in SgAln, AccSg, LcaSg, BotAln, AccBot, LcaBot and LcaRt, where we call them a segmental alignment distance, an accordant segmental distance, an LCA-preserving segmental distance, a bottom-up alignment distance, an accordant bottom-up distance, an LCA-preserving bottom-up distance and an LCA- and root-preserving distance. Then, we show that no distances characterized as non-twisting subforests are metrics, whereas the other distances are metrics.

Finally, we summarize and investigate the time complexity of computing the variations of the edit distance illustrated in Table 1. Here, n is the number of nodes in a tree, m is the number of nodes in another tree (n ≥ m), D is the maximum degree of two trees and d is the minimum degree of two trees.

Table 1 The metricity of the variations τ_𝖠(T ₁, T ₂) for ℳ_𝖠(T ₁, T ₂) in Fig. 2 and time complexity of computing them

Full size table

For ordered trees, we can compute the distances characterized as non-twisting subforests in O(n m D ²) time and those as other subforests or subtrees except the (standard) edit distance and the tree isomorphism in O(n m) time. For unordered trees, it is known that the problem of computing the distances characterized as arbitrary subforests is MAX SNP-hard [21] even if trees are binary [5, 18]. On the other hand, as in the case of alignment distance [6], the problems of computing the distances characterized as non-twisting subforests are MAX SNP-hard, whereas they are tractable if the degrees of trees are bounded by some constant. Furthermore, we can compute the distances characterized as parallel subforests, subtrees and root-preserving subtrees in O(n m d) time except the LCA-preserving bottom-up distance, which we can compute in O(n m) time.

2 Edit Distance and Tai Mapping

A tree T is a connected graph (V, E) without cycles, where V is the set of vertices and E is the set of edges and we denote V and E by V(T) and E(T). The size of T is |V| and denoted by |T|. We sometime denote v ∈ V(T) by v ∈ T. We denote an empty tree (∅, ∅) by ∅. A rooted tree is a tree with one node r chosen as its root. We denote the root of a rooted tree T by r(T).

For each node v in a rooted tree with the root r, let UP _r(v) be the unique path from v to r. The parent of v( ≠ r), which we denote by par(v), is its adjacent node on UP _r(v) and the ancestors of v( ≠ r) are the nodes on UP _r(v) − {v}. We denote the set of all ancestors of v by anc(v). We say that u is a child of v if v is the parent of u and u is a descendant of v if v is an ancestor of u. A leaf is a node having no children. We denote the set of all leaves in T by lv(T). A node neither a leaf nor a root is called an internal node. The degree of a node v, denoted by d(v), is the number of children of v. The degree of a rooted tree T, denoted by d(T), is the maximum number of d(v) for every v ∈ T.

We use the ancestor orders < and ≤ , that is, u < v if v is an ancestor of u and u ≤ v if u < v or u = v. We denote neither u ≤ v nor v ≤ u by u # v. We say that w is the least common ancestor (LCA, for abbreviation) of u and v, denoted by u ⊔ v, if u ≤ w, v ≤ w and there exists no w′ such that w′ < w, u ≤ w′ and v ≤ w′. A (complete) subtree of T = (V, E)rooted by v, denoted by T[v], is a tree T′ = (V′, E′) such that r(T′) = v, V′ = {u ∈ V∣u ≤ v} and E′ = {(u, w) ∈ E∣u, w ∈ V′}.

For nodes u, v ∈ T, u is to the left of v, denoted by u ≼ v, if pre(u) ≤ pre(v) for the preorder number pre and post(u) ≤ post(v) for the postorder number post. We say that a rooted tree is ordered if a left-to-right order among siblings is given; unordered otherwise. We say that a rooted tree is labeled if each node is assigned a symbol from a fixed finite alphabet Σ. For a node v, we denote the label of v by l(v), and sometimes identify v with l(v). We call a rooted labeled tree a tree simply.

A forest is a sequence [T ₁, … , T _n] of trees. For a tree T and a node i ∈ T, T(i) is a forest obtained by deleting the root i in T[i]. We denote the number of trees in a forest F by ||F||, that is, ||F|| = n for F = [T ₁, … , T _n].

In the following sections, we will use two trees T ₁ and T ₂ to obtain the distance between T ₁ and T ₂. Then, for nodes i ∈ T ₁ and j ∈ T ₂, suppose that the children of i and j are i ₁, … , i _s and j ₁, … , j _t, respectively. Also, we sometimes denote the forests T ₁(i) = [T ₁[i ₁], … , T ₁[i _s]] and T ₂(j) = [T ₂[j ₁], … , T ₂[j _t]] by F ₁(i ₁, i _s) and F ₂(j ₁, j _t), respectively.

Definition 1

(Edit operations 14) The edit operations of a tree T are defined as follows. See Fig. 3.

1.
Substitution: Change the label of the node v in T.
2.
Deletion:^{Footnote 2} Delete a node v in T with parent v′, making the children of v become the children of v′. The children are inserted in the place of v as a subsequence in the left-to-right order (for ordered trees) or a subset (for unordered trees) of the children of v′. In particular, if v is the root in T, then the result applying the deletion is a forest consisting of the children of the root.
3.
Insertion: The complement of deletion. Insert a node v as a child of v′ in T making v the parent of a consecutive subsequence (for ordered trees) or a subset (for unordered trees) of the children of v′.

Let ε ∉ Σ denote a special blank symbol and define Σ_ε = Σ ∪ {ε}. Then, we represent each edit operation by (l ₁ ↦ l ₂), where (l ₁, l ₂) ∈ (Σ_ε × Σ_ε − {(ε, ε)}). The operation is a substitution if l ₁ ≠ ε and l ₂ ≠ ε, a deletion if l ₂ = ε, and an insertion if l ₁ = ε. For nodes v and w, we also denote (l(v) ↦ l(w)) by (v ↦ w). We define a cost function γ : (Σ_ε × Σ_ε − {(ε, ε)}) ↦ R ⁺ on pairs of labels. We often constrain a cost function γ to be a metric, that is, γ(l ₁, l ₂) ≥ 0, γ(l ₁, l ₂) = 0 iff l ₁ = l ₂, γ(l ₁, l ₂) = γ(l ₂, l ₁) and γ(l ₁, l ₃) ≤ γ(l ₁, l ₂) + γ(l ₂, l ₃). We call the cost function that γ(l ₁, l ₂) = 1 if l ₁ ≠ l ₂ a unit cost function.

Definition 2

(Edit distance 14) For a cost function γ, the cost of an edit operation e = l ₁ ↦ l ₂ is given by γ(e) = γ(l ₁, l ₂). The cost of a sequence E = e ₁, … , e _k of edit operations is given by $\gamma (E) = {\sum }_{i=1}^{k}\gamma (e_{i})$. Then, an edit distance τ _Tai(T ₁, T ₂) between trees T ₁ and T ₂ is defined as follows:

$$\begin{array}{l} \tau_{\textsc{Tai}}(T_{1},T_{2}) =\min\left\{ \gamma(E)\left| \begin{array}{l} E\text{ is a sequence } \text{of edit operations}\\ \text{transforming }T_{1}\text{ to }T_{2} \end{array} \right.\right\}. \end{array} $$

Definition 3

(Tai mapping 14) Let T ₁ and T ₂ be trees. We say that a triple (M, T ₁, T ₂) is an ordered Tai mapping between T ₁ and T ₂ if M ⊆ V(T ₁) × V(T ₂) and every pair (u ₁, v ₁) and (u ₂, v ₂) in M satisfies that (1) u ₁ = u ₂ iff v ₁ = v ₂ (one-to-one condition), (2) u ₁ ≤ u ₂ iff v ₁ ≤ v ₂ (ancestor condition) and (3) u ₁ ≼ u ₂ iff v ₁ ≼ v ₂ (sibling condition). Also we say that a triple (M, T ₁, T ₂) is an unordered Tai mapping if M ⊆ V(T ₁) × V(T ₂) and every pair (u ₁, v ₁) and (u ₂, v ₂) in M satisfies (1) one-to-one condition and (2) ancestor condition. The set of all possible Tai mappings between T ₁ and T ₂ is denoted by $\mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})$. We will use M instead of (M, T ₁, T ₂) when there is no confusion, call both an ordered and an unordered Tai mapping a mapping.

In particular, we say that a mapping M is an inclusion mapping from T ₁ to T ₂ if, for every node v ∈ T ₁, there exists a node w ∈ T ₂ such that (v, w) ∈ M. In this case, we denote w by M(v).

Let M be a mapping between T ₁ and T ₂. Let I _M and J _M be the sets of nodes in T ₁ and T ₂ but not in M, that is, I _M = {u ∈ T ₁∣(u, v) ∉ M} and J _M = {v ∈ T ₂∣(u, v) ∉ M}. Then, the cost γ(M) of M is given as follows.

$$\gamma(M) =\sum\limits_{(u,v)\in M}\gamma(u,v)+ \sum\limits_{u\in I_{M}}\gamma(u,\varepsilon)+\sum\limits_{v\in J_{M}}\gamma(\varepsilon,v). $$

Theorem 1

(Tai 14) $\tau _{\textsc {Tai}}(T_{1},T_{2}) = \min \{\gamma (M)\mid M\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})\}$.

Finally, we introduce the notion of a subforest, a common subforest and their arrangement of subtrees illustrated in Fig. 2. Note that we ignore labels of nodes.

Definition 4

(Subforest, common subforest) Let T = (V, E) be a tree.

1.
We say that F = (V′, E′) is an embedded subforest in T if V′ ⊆ V and, for every u, v ∈ V′, (u, v) ∈ E′ if u < v and there exists no w ∈ V′ such that u < w and w < v.
2.
We say that F = (V′, E′) is an induced subforest in T if V′ ⊆ V and, for every u, v ∈ V′, (u, v) ∈ E′ if (u, v) ∈ E.
3.
We say that F = (V′, E′) is a complete subforest in T if F is an induced subforest in T and, for every v ∈ V′, all the descendants of v is in V′.

Furthermore, for trees T ₁ and T ₂, we say that F is a common embedded (resp., induced, complete) subforest between T ₁ and T ₂ if F is an embedded (resp., induced, complete) subforest in both T ₁ and T ₂. We call the above three common subforests a common subforest if it is not necessary to distinguish them.

Definition 5

(Arrangement of subtrees in a common subforest) Let F be a common subforest between T ₁ and T ₂. For a node v ∈ F, we denote v ∈ T ₁ ∩ F and v ∈ T ₂ ∩ F by v ¹ and v ², respectively.

1.
We say that F is twisting if there exist u, v, w ∈ F such that u ¹ ⊔ v ¹ < v ¹ ⊔ w ¹ in T ₁ and v ² ⊔ w ² < u ² ⊔ v ² in T ₂. Otherwise F is non-twisting.
2.
We say that F is parallel if, for every u, v, w ∈ F, u ¹ ⊔ v ¹ = u ¹ ⊔ w ¹ in T ₁ iff u ² ⊔ v ² = u ² ⊔ w ² in T ₂.
3.
We say that F is a subtree if F is a tree, that is, ||F|| = 1.
4.
We say that F is a root-preserving subtree if F is a tree T such that r(T) = r(T ₁) = r(T ₂).

Example 1

Consider trees T ₁ and T ₂ in Fig. 4. Then, F ₁, F ₂ and F ₃ illustrated in the rightmost column in Fig. 4 are the common embedded subforest, the common induced subforest and the common complete subforest between T ₁ and T ₂, respectively. Here, every v ¹ in T ₁ (resp., v ² in T ₂) for a node v ∈ F _i (i = 1, 2, 3) is illustrated as gray nodes. Also F ₁ is a root-preserving subtree and F ₂ and F ₃ are parallel subforests.

3 Tai Mapping Hierarchy

In this section, we introduce the variations of a Tai mapping.

Definition 6

(Variations of mapping) Let T ₁ and T ₂ be trees and $M\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})$. We define M ⁻ as M − {(r(T ₁), r(T ₂))}.

1.
We say that M is a less-constrained mapping [10], denoted by $M\in \mathcal {M}_{\textsc {Less}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{1}\sqcup u_{2}<u_{1}\sqcup u_{3}\Longrightarrow v_{2}\sqcup v_{3}=v_{1}\sqcup v_{3}\right). $$
2.
We say that M is an isolated-subtree mapping [17] (or a constrained mapping [19]), denoted by $M\in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{3}<u_{1}\sqcup u_{2}\iff v_{3}<v_{1}\sqcup v_{2}\right). $$
3.
We say that M is an accordant mapping [9] (or a Lu’s mapping [11]), denoted by $M\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u_{1},v_{1}),(u_{2},v_{2}),(u_{3},v_{3})\in M \left( u_{1}\sqcup u_{2}=u_{1}\sqcup u_{3} \iff v_{1}\sqcup v_{2}=v_{1}\sqcup v_{3} \right). $$
4.
We say that M is an LCA-preserving mapping (or a degree-2 mapping [22]), denoted by $M\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u_{1},v_{1}),(u_{2},v_{2})\in M^{-} \left( (u_{1}\sqcup u_{2},v_{1}\sqcup v_{2})\in M\right). $$
5.
We say that M is an LCA- and root-preserving mapping, denoted by $M\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$, if $M\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$ and (r(T ₁), r(T ₂)) ∈ M.
6.
We say that M is a top-down mapping [3, 13] (or a degree-1 mapping), denoted by $M\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u,v)\in M^{-} \left( (\textit{par}(u),\textit{par}(v))\in M\right). $$
7.
We say that M is a bottom-up mapping [9, 15, 18],^{Footnote 3} denoted by $M\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u,v)\in M \left( \begin{array}{l} \forall u^{\prime}\in T_{1}[u] \exists v^{\prime}\in T_{2}[v]\left( (u^{\prime},v^{\prime})\in M\right)\\ \wedge \forall v^{\prime}\in T_{2}[v] \exists u^{\prime}\in T_{1}[u]\left( (u^{\prime},v^{\prime})\in M\right) \end{array} \right). $$
8.
We say that M is a segmental mapping [7], denoted by $M\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})$, if M satisfies the following condition.
$$\forall (u,v)\in M^{-} \left( \begin{array}{l} \exists (u^{\prime},v^{\prime})\in M \biggl(\left( u^{\prime}\in\textit{anc}(u)\right) \wedge \left( v^{\prime}\in\textit{anc}(v)\right) \biggr)\\ \qquad \Longrightarrow \left( (\textit{par}(u),\textit{par}(v))\in M\right) \end{array} \right). $$
9.
We say that M is a top-down segmental mapping [7], denoted by $M\in \mathcal {M}_{\textsc {TopSg}}(T_{1},T_{2})$, if M is a segmental mapping such that (r(T ₁), r(T ₂)) ∈ M.
10.
We say that a mapping M is an alignable mapping [9], denoted by $M\in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})$, if there exist a forest F, an inclusion mapping M ₁ from T ₁ to F and an inclusion mapping M ₂ from T ₂ to F satisfying that M ₁(u) = M ₂(v) for every (u, v) ∈ M.^{Footnote 4}
11.
$\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2}) = \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\cap \mathcal {M}_{\mathtt {B}}(T_{1},T_{2})$ for mappings A and B. This paper deals with $\mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})$, $\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})$.

Example 2

Figure 5 illustrates mappings M _i (1 ≤ i ≤ 8) [7, 9] such that $M_{1}\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$ but $M_{1}\not \in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$; $M_{2}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$ but $M_{2}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$; $M_{3}\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})$ but $M_{3}\not \in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$; $M_{4}\in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})$ but $M_{4}\not \in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})$; $M_{5}\in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})$ but $M_{5}\not \in \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})$; $M_{6}\in \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})$ but $M_{6}\not \in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})$; $M_{7}\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})$ but $M_{7}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$ and $M_{7}\not \in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$; $M_{8}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$ but $M_{8}\not \in \mathcal {M}_{\textsc {Aln}}(T_{1},T_{2})$.

In particular, for a segmental mapping $M\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})$, the formula in Definition 6.8 claims that, for (u, v),(u′, v′) ∈ M such that u′ ∈ anc(u) and v′ ∈ anc(v), there exist nodes u ₁, … , u _k ∈ T ₁ and v ₁, … , v _k ∈ T ₂ such that u ₁ = u, u _k = u′, v ₁ = v, v _k = v′, u _i+1 = par(u _i), v _i+1 = par(v _i) (1 ≤ i ≤ k − 1) and (u _i, v _i) ∈ M (1 ≤ i ≤ k).

The following lemma has been shown in [7, 9, 17] or holds from Definition 6.

Lemma 1

The following statements hold.

1.
$\mathcal {M}_{\textsc {Iso}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2}) \subset \mathcal {M}_{\textsc {Ilst}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Aln}} (T_{1},T_{2})\subset \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})$ [9, 17].
2.
$\mathcal {M}_{\textsc {Iso}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2}) \subset \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Tai}}(T_{1},T_{2})$ [7].
3.
$\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ for mappings A and B.
4.
If $\mathcal {M}_{\mathtt {B}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {C}}(T_{1},T_{2})$, then $\mathcal {M}_{\mathtt {AB}}(T_{1},T_{2})\subseteq \mathcal {M}_{\mathtt {AC}}(T_{1},T_{2})$ for mappings A, B and C.

Theorem 2

The hierarchy illustrated in Fig. 2 holds, where the mapping $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ is denoted by its subscript A. In particular, the following statements hold. Here, S # S′ denotes that neither S ⊆ S′ nor S′ ⊆ S for two sets S and S′.

1.
$\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})$
2.
$\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})~\#~\mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$ , $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$
3.
$\mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$ , $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$.
4.
$\mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})~\#~\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$.
5.
$\mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$.
6.
$\mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$.
7.
$\mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$.

Proof

1.
We just show that $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})$. We can show that $\mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {IlstBot}}(T_{1},T_{2})$ in the similar way. Since Lemma 1 implies that $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})$, we just show the converse direction that $\mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$.

Suppose that $M\in \mathcal {M}_{\textsc {IlstSg}}(T_{1},T_{2})$ and let (u ₁, v ₁),(u ₂, v ₂),(u ₃, v ₃) ∈ M. Then, it holds that u ₃ < u ₁ ⊔ u ₂ ⇔ v ₃ < v ₁ ⊔ v ₂. Also suppose that $M\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$. Consider the case that there exist (u ₁, v ₁),(u ₂, v ₂),(u ₃, v ₃) ∈ M such that u ₁ ⊔ u ₂ = u ₁ ⊔ u ₃ but v ₁ ⊔ v ₂ ≠ v ₁ ⊔ v ₃.

Suppose that u ₁ # u ₂, u ₂ # u ₃ and u ₃ # u ₁. Since M is a mapping, it holds that v ₁ # v ₂, v ₂ # v ₃ and v ₃ # v ₁. Since v ₁ ⊔ v ₂ ≠ v ₁ ⊔ v ₃ and T ₂ is a tree, it holds that either (1) v ₁ ⊔ v ₂ < v ₁ ⊔ v ₃ or (2) v ₁ ⊔ v ₃ < v ₁ ⊔ v ₂ in T ₂. (See Fig. 6).

For the case (1), it holds that v ₃ ≮ v ₁ ⊔ v ₂, which is a contradiction that u ₃ < u ₁ ⊔ u ₂ ⇔ v ₃ < v ₁ ⊔ v ₂. For the case (2), it holds that v ₂ ≮ v ₁ ⊔ v ₃, which is a contradiction that u ₂ < u ₁ ⊔ u ₃ ⇔ v ₂ < v ₁ ⊔ v ₃. (Note that this discussion holds for a non-segmental mapping).

Suppose that u ₁ # u ₂, u ₁ < u ₃ and u ₂ < u ₃. Since u ₁ ⊔ u ₂ = u ₁ ⊔ u ₃, it holds that u ₁ ⊔ u ₂ = u ₃ in T ₁. On the other hand, since v ₁ ⊔ v ₂ ≠ v ₁ ⊔ v ₃, it holds that v ₁ ⊔ v ₂ < v ₁ ⊔ v ₃ = v ₃ in T ₂. (See the case (3) in Fig. 6).

Since M is a segmental mapping, there exists a node u′ ∈ T ₁ such that (u′, v ₁ ⊔ v ₂) ∈ M. Since v ₁ ⊔ v ₂ < v ₃, it holds that u′ < u ₃ = u ₁ ⊔ u ₂. On the other hand, since v ₁ < v ₁ ⊔ v ₂ and v ₂ < v ₁ ⊔ v ₂, it holds that u ₁ < u′ and u ₂ < u′, that is, u ₃ = u ₁ ⊔ u ₂ ≤ u′, which is a contradiction. The above discussion also holds for the case that there exist (u ₁, v ₁),(u ₂, v ₂),(u ₃, v ₃) ∈ M such that v ₁ ⊔ v ₂ = v ₁ ⊔ v ₃ but u ₁ ⊔ u ₂ ≠ u ₁ ⊔ u ₃.

Hence, for every (u ₁, v ₁),(u ₂, v ₂),(u ₃, v ₃) ∈ M, it holds that u ₁ ⊔ u ₂ = u ₁ ⊔ u ₃ ⇔ v ₁ ⊔ v ₂ = v ₁ ⊔ v ₃, which implies that $M\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$.

In order to show the other statements, it is necessary to show the inclusion and the properness.

For the inclusion, by Lemma 1, it is sufficient to show that $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$. Since (r(T ₁), r(T ₂)) ∈ M for every $M\in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$ and $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subset \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$, it holds that $M\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$. Since $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2}) = \mathcal {M}_{\textsc {TopSg}}(T_{1},T_{2})$ [7] and $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$, it holds that $\mathcal {M}_{\textsc {Top}}(T_{1},T_{2})\subseteq \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$ by Lemma 1.4.

On the other hand, we show the properness by using the mappings M _i (1 ≤ i ≤ 9) in Fig. 7.
2.
It holds that $M_{1}\in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$ but $M_{1}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$ and $M_{1}\not \in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$; $M_{2}\in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$ but $M_{2}\not \in \mathcal {M}_{\textsc {Top}}(T_{1},T_{2})$ and $M_{2}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$; $M_{3}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$ but $M_{3}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$ and $M_{3}\not \in \mathcal {M}_{\textsc {LcaRt}}(T_{1},T_{2})$.
3.
It holds that $M_{4}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$ but $M_{4}\not \in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$; $M_{5}\in \mathcal {M}_{\textsc {Acc}}(T_{1},T_{2})$ but $M_{5}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$; $M_{6}\in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$ but $M_{6}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$.
4.
It holds that $M_{3}\in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$ but $M_{3}\not \in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$; $M_{4}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$ but $M_{4}\not \in \mathcal {M}_{\textsc {Lca}}(T_{1},T_{2})$.
5.
It holds that $M_{7}\in \mathcal {M}_{\textsc {Sg}}(T_{1},T_{2})$ but $M_{7}\not \in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$. Also it holds that $M_{7}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$ but $M_{7}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$.
6.
It holds that $M_{8}\in \mathcal {M}_{\textsc {SgAln}}(T_{1},T_{2})$ but $M_{8}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$.
7.
It holds that $M_{9}\in \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})$ but $M_{9}\not \in \mathcal {M}_{\textsc {LcaBot}}(T_{1},T_{2})$; $M_{6}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$ but $M_{6}\not \in \mathcal {M}_{\textsc {AccBot}}(T_{1},T_{2})$; $M_{7}\in \mathcal {M}_{\textsc {Bot}}(T_{1},T_{2})$ but $M_{7}\not \in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$. □

Remark 1

We can characterize a common subforest between T ₁ and T ₂ by using a mapping M. Recall that I _M and J _M are the sets of nodes in T ₁ and T ₂ but not in M. Then, for T ₁ = (V ₁, E ₁) and T ₂ = (V ₂, E ₂), M induces a common subforest F = (V′, E′) between T ₁ and T ₂ such that V′ = V ₁ − I _M = V ₂ − J _M. We call such a common subforest a common subforest through M.

Since the segmental mapping requires to preserve the parent-children relationship as possible, a common subforest through the segmental mapping and its sub-mappings, that is, Sg, SgAln, AccSg, LcaSg and Top, is an induced subforest. These mappings require the operational condition that the deletion and the insertion are allowed to apply just leaves or roots. In particular, the mappings in Top require the operational condition that the deletion and the insertion are allowed to apply just leaves.

Since the bottom-up mapping requires to preserve the complete subtrees, a common subforest through the bottom-up mapping and its sub-mappings, that is, Bot, BotAln, AccBot, LcaBot and Iso, is a complete subforest. These mappings require the operational condition that the deletion and the insertion are allowed to apply just roots.

Remark 2

Since the LCA-preserving mapping provides the root as an LCA of all the nodes in a common subforest, the common subforest through the LCA-preserving mapping and its sub-mappings, that is, Lca, LcaSg, LcaBot, LcaRt, Top and Iso, is a subtree. In particular, the LCA- and root-preserving mapping contains a pair of roots in given trees, the common subforest through the LCA- and root-preserving mapping and its sub-mappings, that is, LcaRt, Top and Iso, is a root-preserving subtree.

Since the definition of the accordant mapping is same as a parallel common forest, the common subforest through the accordant mapping and its sub-mappings, that is, Acc, AccSg and AccBot, is parallel.

As similar as a twisting common subforest, we say that a mapping M between trees T ₁ and T ₂ has a twist if there exist pairs (u ₁, v ₁),(u ₂, v ₂),(u ₃, v ₃) ∈ M satisfying the following condition.

$$(u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2}). $$

We call such pairs (u ₁, v ₁),(u ₂, v ₂) and (u ₃, v ₃) a twist of M.

Since (v ₁ ⊔ v ₂ ≤ v ₂ ⊔ v ₃) ≡ (v ₁ ⊔ v ₃ = v ₂ ⊔ v ₃) and ¬(v ₂ ⊔ v ₃ < v ₁ ⊔ v ₂) ≡ (v ₁ ⊔ v ₂ ≤ v ₂ ⊔ v ₃) for every tree T and v ₁, v ₂, v ₃ ∈ T, we can transform the formula representing the nonexistence of twists in M logically as follows.

$$\begin{array}{ll} & \neg\exists (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \neg\left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \wedge (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( \neg (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \vee\neg (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow\neg (v_{2}\sqcup v_{3}<v_{1}\sqcup v_{2})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow (v_{1}\sqcup v_{2}\le v_{2}\sqcup v_{3})\right)\\ \equiv & \forall (u_{1},v_{1})(u_{2},v_{2})(u_{3},v_{3})\in M \left( (u_{1}\sqcup u_{2}<u_{2}\sqcup u_{3}) \Longrightarrow (v_{1}\sqcup v_{3}= v_{2}\sqcup v_{3})\right). \end{array} $$

Hence, a mapping M is less-constrained if and only if M has no twists. As a result, the common subforest through the alignable mapping and its sub-mappings, that is, Aln, SgAln and BotAln, is non-twisting.

Definition 7

(Variations of edit distance) For every $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ in a mapping hierarchy in Fig. 2, the variation τ _A(T ₁, T ₂) of the edit distance τ _Tai is defined as the minimum cost of mappings in $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$:

$\tau _{\mathtt {A}}(T_{1},T_{2}) = \min \{\gamma (M)\mid M\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})\}$.

Remark 3

If the cost function is a metric, then it holds that τ _LcaRt(T ₁, T ₂) = τ _Lca(T ₁, T ₂) = τ _Acc(T ₁, T ₂) [9]. On the other hand, consider the trees T ₁ and T ₂ in Fig. 8. Then, we obtain the minimum cost mapping $M_{1}\in \mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$ and the minimum cost mapping $M_{2}\in \mathcal {M}_{\textsc {LcaSg}}(T_{1},T_{2})$ under the unit cost function γ in Fig. 8. Hence, it holds that τ _AccSg(T ₁, T ₂) = γ(M ₁) = 4 < 6 = τ _LcaSg(T ₁, T ₂) = γ(M ₂).

It is known that τ _Tai, τ _Ilst, τ _Lca( = τ _Acc), τ _Top, τ _Sg, and τ _Bot are metrics, whereas τ _Aln is not [7, 9]. In the remainder of this section, we investigate whether or not τ _A is a metric for the newly introduced mappings A ∈ {SgAln, AccSg, LcaSg, BotAln, AccBot, LcaBot, LcaRt}.

Let $M_{i}\in \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})$ for i = 1, 2. Then, we define the composition M ₁∘M ₂ of mappings M ₁ and M ₂ as follows.

$$M_{1}\circ M_{2} = \left\{ \begin{array}{l} (u,w)\\ ~\in V(T_{1})\times V(T_{3}) \end{array} \left| \begin{array}{l} \exists v\in V(T_{2})\\ ~\text{ s.t. } (u,v)\in M_{1}\text{ and }(v,w)\in M_{2} \end{array} \right.\right\}. $$

Lemma 2

(7, 9) Let $M_{i}\in \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})$ for i = 1, 2, where:

$$\mathtt{A}\in\{\textsc{Tai},\textsc{Ilst}, \textsc{Acc},\textsc{Lca},\textsc{Top},\textsc{Sg},\textsc{Bot}\}. $$

Then, the following statements hold.

1.
$M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{3})$.
2.
For a cost function γ that is a metric, it holds that γ(M ₁ ∘M ₂) ≤ γ(M ₁) + γ(M ₂).

Lemma 3

If both $\mathcal {M}_{\mathtt {A}}$ and $\mathcal {M}_{\mathtt {B}}$ satisfy the statements in Lemma 2, then so does $\mathcal {M}_{\textsc {{\tt AB}}}$.

Proof

Suppose that $M_{i}\in \mathcal {M}_{\mathtt {AB}}(T_{i},T_{i+1})$ for i = 1, 2. By the definition that $\mathcal {M}_{\textsc {\tt AB}}(T_{i},T_{i+1}) = \mathcal {M}_{\mathtt {A}}(T_{i},T_{i+1})\cap \mathcal {M}_{\mathtt {B}}(T_{i},T_{i+1})$ , it holds that $M_{i}\in \mathcal {M}_{\textsc {\tt A}}(T_{i},T_{i+1})$ and $M_{i}\in \mathcal {M}_{\mathtt {B}}(T_{i},T_{i+1})$ . By Lemma 2.1 for $\mathcal {M}_{\mathtt {A}}$ and $\mathcal {M}_{\mathtt {B}}$, it holds that $M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{3})$ and $M_{1}\circ M_{2}\in \mathcal {M}_{\textsc {\tt B}}(T_{1},T_{3})$ . Hence, it is obvious that $M_{1}\circ M_{2}\in \mathcal {M}_{\mathtt {AB}}(T_{1},T_{3})$, so Lemma 2.1 holds for $\mathcal {M}_{\mathtt {AB}}$. Since the proof of Lemma 2.2 just depends on Lemma 2.1, Lemma 2.2 also holds for $\mathcal {M}_{\mathtt {AB}}$. □

Theorem 3

(Metricity) If a cost function is a metric, then so are τ _AccSg , τ _LcaSg , τ _AccBot , τ _LcaBot and τ _LcaRt . On the other hand, neither τ _SgAln nor τ _BotAln is a metric even if a cost function is a metric.

Proof

First, we show that τ _AccSg is a metric. To do this, it is sufficient to show that τ _AccSg(T ₁, T ₂) ≥ 0, τ _AccSg(T ₁, T ₂) = 0 iff T ₁ ≡ T ₂, τ _AccSg(T ₁, T ₂) = τ _AccSg(T ₂, T ₁) and τ _AccSg(T ₁, T ₃) ≤ τ _AccSg(T ₁, T ₂) + τ _AccSg(T ₂, T ₃). The first three statements follow from the definition of $\mathcal {M}_{\textsc {AccSg}}(T_{1},T_{2})$. Let M _i be the minimum cost accordant segmental mapping between T _i and T _i+1 (i = 1,2). By Lemma 2 and 3, it holds that τ _AccSg(T ₁, T ₃) ≤ γ(M ₁∘M ₂) ≤ γ(M ₁) + γ(M ₂) = τ _AccSg(T ₁, T ₂) + τ _AccSg(T ₂, T ₃). By using the same proof, we can show that τ _LcaSg, τ _AccBot and τ _LcaBot are metrics. Also it is obvious that τ _LcaRt is a metric, because τ _Lca is a metric.

On the other hand, in order to show that neither τ _SgAln nor τ _BotAln is a metric, consider the trees T ₁, T ₂ and T ₃ in Fig. 9 and suppose that γ is a unit cost function. Then, we obtain the minimum cost mapping $M_{12}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{2})$, $M_{13}\in \mathcal {M}_{\textsc {BotAln}}(T_{1},T_{3})$ and $M_{23}\in \mathcal {M}_{\textsc {BotAln}}(T_{2},T_{3})$ under γ, respectively, in Fig. 9.

Hence, it holds that 8 = γ(M ₁₂) = τ _BotAln(T ₁, T ₂) > τ _BotAln(T ₁, T ₃) + τ _BotAln(T ₂, T ₃) = γ(M ₁₃) + γ(M ₂₃) = 3 + 3 = 6. Since this statement holds for τ _SgAln, neither τ _SgAln nor τ _BotAln satisfies the triangle inequality, neither is a metric. □

4 Time Complexity of Computing Distances

In this section, we investigate the time complexity of computing the variations of the edit distance for both ordered and unordered trees. Here, we identify a node with its postorder number. In the remainder of this paper, let n = |T ₁|, m = |T ₂|, D = max{d(T ₁), d(T ₂)} and d = min{d(T ₁), d(T ₂)}. For a mapping A, we denote the distance between ordered and unordered trees by $\tau _{\mathtt {A}}^{o}$ and $\tau _{\mathtt {A}}^{u}$, respectively, whereas the distance between ordered and unordered forests by $\delta _{\mathtt {A}}^{o}$ and $\delta _{\mathtt {A}}^{u}$, respectively.

4.1 Tractable Cases

In this section, we investigate the tractable cases of computing the variations of the edit distance.

Remark 4

According to [7], for every pair (i, j) ∈ T ₁ × T ₂ (1 ≤ i ≤ n,1 ≤ j ≤ m), we can compute the top-down distance $\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])$ between T ₁[i] and T ₂[j] in O(n m) time [7].

On the other hand, we can compute $\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])$ by using the recurrences in Fig. 10 [18]. Here, BM is the maximum weighted bipartite matching with weight ω in a complete bipartite graph G _M = (X, Y, E) such that X and Y are the sets of children of i in T ₁ and j in T ₂ and the weight of an edge (i′, j′) ∈ E is set to $\tau _{\textsc {Top}}^{u}(T_{1}[i^{\prime }],\emptyset )+\tau _{\textsc {Top}}^{u}(\emptyset ,T_{2}[j^{\prime }])- \tau _{\textsc {Top}}^{u}(T_{1}[i^{\prime }],T_{2}[j^{\prime }])$ [18, 22]. Hence, we can store $\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])$ for every (i, j) ∈ T ₁ × T ₂ in O(n m d) time.

Theorem 4

We can compute $\tau _{\textsc {LcaRt}}^{o}(T_{1},T_{2})$ in O(nm) time and $\tau _{\textsc {LcaRt}}^{u}(T_{1},T_{2})$ in O(nmd) time.

Proof

We can design the recurrences of computing τ _LcaRt by adding

$$\tau_{\textsc{LcaRt}}(T_{1}[r_{1}],T_{2}[r_{2}]) = \delta_{\textsc{Lca}}(T_{1}(r_{1}),T_{2}(r_{2}))+\gamma(r_{1},r_{2}) $$

to the recurrences of computing τ _Lca [18, 22] (for both ordered and unordered trees), where r _i is the root of T _i (i = 1, 2). Hence, we can compute $\tau _{\textsc {LcaRt}}^{o}(T_{1},T_{2})$ in O(n m) time and $\tau _{\textsc {LcaRt}}^{u}(T_{1},T_{2})$ in O(n m d) time. □

Theorem 5

We can compute $\tau _{\textsc {LcaSg}}^{o}(T_{1},T_{2})$ in O(nm) time and $\tau _{\textsc {LcaSg}}^{u}(T_{1},T_{2})$ in O(nmd) time.

Proof

For every pair (i, j) ∈ T ₁ × T ₂, we define the distance d(T ₁, i, T ₂, j) between T ₁ and T ₂ based on $\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])$ as follows.

$$d(T_{1},i,T_{2},j) = \tau_{\textsc{Top}}^{o}(T_{1}[i],T_{2}[j]) +\sum\limits_{i^{\prime}\in T_{1}-T_{1}[i]}\gamma(i^{\prime},\varepsilon) +\sum\limits_{j^{\prime}\in T_{2}-T_{2}[j]}\gamma(\varepsilon, j^{\prime}). $$

By Remark 4, it holds that

$$\tau_{\textsc{LcaSg}}^{o}(T_{1},T_{2}) =\min\{ d(T_{1},i,T_{2},j) \mid 1\le i\le n,1\le j\le m \}, $$

which we can compute in O(n m) time. Also, by Remark 4, we can replace $\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])$ in d(T ₁, i, T ₂, j) with $\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])$ for computing $\tau _{\textsc {LcaSg}}^{u}$, which runs in O(n m d) time. □

Theorem 6

We can compute $\tau _{\textsc {AccSg}}^{o}(T_{1},T_{2})$ in O(nm) time and $\tau _{\textsc {SgAln}}^{o}(T_{1},T_{2})$ in O(nmD ² ) time.

Proof

Consider the recurrences of computing $\tau _{\textsc {AccSg}}^{o}$ and $\tau _{\textsc {SgAln}}^{o}$ in Fig. 11. The difference between the recurrences of computing $\tau _{\textsc {AccSg}}^{o}$ and those of $\tau _{\textsc {Acc}}^{o}$ [9] is the first two formulas in $\tau _{\textsc {AccSg}}^{o}$, and the difference between the recurrences of computing $\tau _{\textsc {SgAln}}^{o}$ and those of $\tau _{\textsc {Aln}}^{o}$ [6] is the first formula in $\tau _{\textsc {SgAln}}^{o}$, respectively. The first formula in $\tau _{\textsc {AccSg}}^{o}$ and $\tau _{\textsc {SgAln}}^{o}$ computes the distance for segmentations of (i, j) ∈ T ₁ × T ₂. In the second formula in $\tau _{\textsc {AccSg}}^{o}$, we replace γ(i, j) in $\tau _{\textsc {Acc}}^{o}$ with γ(i, ε) + γ(ε, j). Since we can compute them in constant time (after computing $\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])$ in advance according to Remark 4) and by the same discussion of [9] and [6], the statement holds. □

Theorem 7

We can compute $\tau _{\textsc {AccSg}}^{u}(T_{1},T_{2})$ in O(nmd) time.

Proof

We can compute $\tau _{\textsc {AccSg}}^{u}(T_{1},T_{2})$ in O(n m d) time by using the recurrences in Fig. 12 after storing every $\tau _{\textsc {Top}}^{u}(T_{1}(i),T_{2}(j))$ ((i, j) ∈ T ₁ × T ₂) in O(n m d) time according to the recurrences in Fig. 10 of Remark 4. □

Next, we investigate the variations of the bottom-up distance τ _Bot.

Remark

We say that two trees T ₁ and T ₂ are ordered (resp., unordered) label-free isomorphic, denoted by $T_{1}{\equiv _{l}^{o}} T_{2}$ (resp., $T_{1}{\equiv _{l}^{u}} T_{2}$), if there exists a mapping $M\in \mathcal {M}_{\textsc {Tai}}^{o}(T_{1},T_{2})$ (resp., $M\in \mathcal {M}_{\textsc {Tai}}^{u}(T_{1},T_{2})$), called a label-free isomorphism, such that I _M = J _M = ∅. Also we define diff ^r(T ₁, T ₂) (r ∈ {o, u}) as follows, where M is a label-free isomorphism between T ₁ and T ₂.

$$\begin{array}{rcl} \textit{diff}^{r}(T_{1},T_{2}) & = & \left\{ \begin{array}{l@{~~~}l} \sum\limits_{(i,j)\in M}\gamma(i,j) & \text{if }T_{1}{\equiv_{l}^{r}} T_{2},\\ \sum\limits_{i\in T_{1}}\gamma(i,\varepsilon)+{\sum}_{j\in T_{2}}\gamma(\varepsilon,j) & \text{if }T_{1}{\not\equiv_{l}^{r}} T_{2}. \end{array} \right. \end{array} $$

We can check $T_{1}{\equiv _{l}^{r}} T_{2}$ and compute diff ^r(T ₁, T ₂) in O(n + m) time [15].

Theorem 8

We can compute $\tau _{\textsc {LcaBot}}^{o}(T_{1},T_{2})$ and $\tau _{\textsc {LcaBot}}^{u}(T_{1},T_{2})$ in O(nm) time.

Proof

For r ∈ {o, u}, by replacing $\tau _{\textsc {Top}}^{r}(T_{1}[i],T_{2}[j])$ in d(T ₁, i, T ₂, j) in Theorem 5 with diff ^r(T ₁[i], T ₂[j]) and by Remark 5, we can compute $\tau _{\textsc {LcaBot}}^{r}(T_{1},T_{2})$ in O(n m) time. □

Theorem 9

We can compute $\tau _{\textsc {AccBot}}^{o}(T_{1},T_{2})$ in O(nm) time and $\tau _{\textsc {BotAln}}^{o}(T_{1},T_{2})$ in O(nmD ² ) time.

Proof

Remark 5 implies that we can compute $\tau _{\textsc {AccBot}}^{o}$ and $\tau _{\textsc {BotAln}}^{o}$ by replacing $\tau _{\textsc {Top}}^{o}(T_{1}[i],T_{2}[j])$ in the recurrences of computing $\tau _{\textsc {AccSg}}^{o}$ and $\tau _{\textsc {SgAln}}^{o}$ in Fig. 11 with diff ^o(T ₁[i], T ₂[j]) in O(n m) time and in O(n m D ²) time, respectively. □

Theorem 10

We can compute $\tau _{\textsc {AccBot}}^{u}(T_{1},T_{2})$ in O(nmd) time.

Proof

We extend the function diff ^u for unordered trees in Remark 5 to a function fdiff ^u for forests. Let F ₁ = [S ₁, … , S _k] and F ₂ = [T ₁, … , T _l] be forests. First, for X = {1, … , k} and Y = {1, … , l}, we construct a weighted bipartite graph G = (X, Y, E) such that (i, j) ∈ E if and only if $S_{i}{\equiv _{l}^{u}} T_{j}$ and the weight ω((i, j)) = |S _i|+|T _j|−diff ^u(S _i, T _j). Also, let ⊆ X × Y be the maximum weighted bipartite matching of G, $\mathtt {BM}_{X}^{-}=\{i\in X\mid (i,j)\not \in \mathtt {BM}\}$ and ${\tt BM}_{Y}^{-}=\{j\in Y\mid (i,j)\not \in \mathtt {BM}\}$. Then, we define fdiff ^u(F ₁, F ₂) as follows.

$$\textit{fdiff}^{u}(F_{1},F_{2}) = \sum\limits_{(i,j)\in \mathtt{BM}}\textit{diff}^{u}(S_{i},T_{j}) +\sum\limits_{i\in {\mathtt{BM}_{X}^{-}}}|S_{i}| +\sum\limits_{j\in {\mathtt{BM}_{X}^{-}}}|T_{j}|. $$

It is obvious that fdiff ^u(F ₁, F ₂) computes the smallest value when selecting pairs of label-free isomorphic trees in F ₁ and F ₂, which is $\delta _{\textsc {AccBot}}^{u}(F_{1},F_{2})$. Hence, by replacing $\tau _{\textsc {Top}}^{u}(T_{1}[i],T_{2}[j])$ and $\delta _{\textsc {Top}}^{u}(T_{1}(i),T_{2}(j))$ in Fig. 12 with diff ^u(T ₁[i], T ₂[j]) and fdiff ^u(T ₁(i), T ₂(j)), we can compute $\tau _{\textsc {AccBot}}^{u}(T_{1},T_{2})$ in O(n m d) time. □

4.2 Intractable Cases

Suppose that π₁ and π₂ be two optimization problems. Then, we say that π₁ L c-reduces to π₂ if there exist polynomial-time algorithms f, g and constants α, β > 0 satisfying the following statements for an instance I of π₁:

1.
opt(f(I)) ≤ α⋅opt(I).
2.
For a solution of f(I) with weight s ₂, the algorithm g produces in polynomial time a solution of I with weight s ₁ such that |s ₁ − opt(I)| ≤ β⋅|s ₂ − opt(f(I))|.

If π₁ L-reduces π₂, and π₂ can be approximated in polynomial time within a factor 1 + ε, then π₁ can be approximated within the factor 1 + α β ε. If π₂ has a polynomial time approximation scheme (PTAS), then so does π₁ [12].

A problem is MAX SNP-hard if every problem in MAX SNP can be L-reduced to it. Since the composition of two L-reductions is also an L-reduction, a problem is MAX SNP-hard if a MAX SNP-hard problem can be L-reduced it. It is known that if any MAX SNP-hard problem has a PTAS, then P=NP. Hence, it is very unlikely for a MAX SNP-hard problem to have a PTAS [12].

Concerned with this paper, Zhang and Jiang [21] have first shown that the problem of computing $\tau _{\textsc {Tai}}^{u}(T_{1},T_{2})$ is MAX SNP-hard. Akutsu et al. [1] and Hirata et al. [5] have shown that this problem is also MAX SNP-hard even if the height of T ₁ and T ₂ is at most 2 and T ₁ and T ₂ are binary trees, respectively. Furthermore, Yamamoto et al. [18] have shown that the problem of computing $\tau _{\textsc {Bot}}^{u}(T_{1},T_{2})$ is MAX SNP-hard even if the degrees of T ₁ and T ₂ are binary, which implies the same MAX SNP-hardness of the problem of computing $\tau _{\textsc {Sg}}^{u}(T_{1},T_{2})$.

Note that we cannot apply this proof to showing that the problems of computing $\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})$ and $\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})$ are MAX SNP-hard, with preserving alignable mappings. Hence, in this paper, we give the similar proof of [18] to show that the problems of computing $\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})$ and $\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})$ are MAX SNP-hard, by using L-reduction from MAX 3SC-3:

Maximum Bounded Covering by 3-Sets (MAX 3SC-3) [8]:

Instance: A finite set S and a collection $\mathcal {C}$ of 3-element subsets of S, where every element of S occurs in at most three of the subsets in $\mathcal {C}$.

Solution: The largest covering of S, where a covering is a collection of mutually disjoint sets in $\mathcal {C}$.

Let S = {s ₁, … , s _m}, $\mathcal {C}=\{C_{1},\ldots ,C_{n}\}$ and C _i = {s _i1, s _i2, s _i3} be an instance of MAX 3SC-3, where s _i1, s _i2, s _i3 ∈ S. Also, let $\mathcal {C}^{*}$ be the largest covering of S. We assume that the cost function γ is a unit cost function.

We construct three trees $\hat {C}_{i}$ for C _i (1 ≤ i ≤ n), $\hat {s}_{j}$ for s _j (1 ≤ j ≤ m) and $\hat {D}$ illustrated in Fig. 13 (left), where λ is a new label. Then, we construct two trees T ₁ and T ₂ illustrated in Fig. 13 (right). We call the transformation from an instance of MAX 3SC-3 to trees T ₁ and T ₂ f. Then, we can show the following lemma as same as [18].

Lemma 4

For A ∈ {SgAln, BotAln}, let M be the minimum cost mapping in $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ Then, for every $\hat {C_{i}}$ in T ₁ , M maps (a) all of the three subtree $\hat {s_{i1}}$ , $\hat {s_{i2}}$ and $\hat {s_{i3}}$ in $\hat {C_{i}}$ to the same three subtrees in T ₂ or (b) $\hat {C_{i}}$ to some dummy subtree $\hat {D}$ in T ₂.

Remark 6

Lemma 4 does not hold for a mapping in $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ such that ∈ {Ilst, Acc, AccSg, AccBot}. For i and i′, suppose that $\hat {C_{i}}$ satisfies (a) and $\hat {C_{i^{\prime }}}$ satisfies (b) in Lemma 4. Let v _{i
j} (resp., w _{i
j}) denotes the leaf in T ₁ (resp., T ₂) labeled by $\hat {s_{ij}}$ and w _λ denotes some leaf in T ₂ labeled by λ.

Then, $M=\{ (v_{i1},w_{i1}), (v_{i2},w_{i2}), (v_{i^{\prime }1},w_{\lambda })\}$ is corresponding to the part of the mapping between T ₁ and T ₂ in Lemma 4. However, it holds that w _λ < w _i1 ⊔ w _i2 but $v_{i^{\prime }1}\not <v_{i1}\sqcup v_{i2}$. Hence, M is not an isolated-subtree mapping.

Lemma 5

For A ∈ {SgAln, BotAln}, $\tau _{\mathtt {A}}^{u}(T_{1},T_{2}) = 9n+3m-|\mathcal {C}^{*}|+2$.

Proof

Let M be the minimum cost mapping in $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$, and $k=|\mathcal {C}^{*}|$.

By Lemma 4, for every $C_{i}\in \mathcal {C}^{*}$, M maps the three subtrees $\hat {s}_{i1}$, $\hat {s}_{i2}$ and $\hat {s}_{i3}$ in $\hat {C}_{i}$ to the same three subtrees in T ₂. Since the cost of a mapping between $\hat {C}_{i}$ in T ₁ and $\hat {s}_{i1}$, $\hat {s}_{i2}$ and $\hat {s}_{i3}$ in T ₂ is 4 (which is the number of non-mapped nodes in $\hat {C}_{i}$ in T ₁) and $|\mathcal {C}^{*}|=k$, the cost of M concerned with $\mathcal {C}^{*}$ is 4k. Also, by Lemma 4, for every $C_{i}\in \mathcal {C}-\mathcal {C}^{*}$, M maps $\hat {C}_{i}$ to some dummy tree $\hat {D}$ in T ₂. Since the cost of a mapping between $\hat {C}_{i}$ in T ₁ and $\hat {D}$ in T ₂ is 9 (which is the number of pairs of nodes labeled by s _{i
j} (j = 1,2,3) in T ₁ and λ in T ₂) and $|\mathcal {C}-\mathcal {C}^{*}|=n-k$, the cost of M concerned with $\mathcal {C}-\mathcal {C}^{*}$ is 9(n − k). Since M does not touch m − 3k subtrees $\hat {s}_{j}$ of s _j in T ₂, n − (n − k) = k dummy subtrees $\hat {D}$ in T ₂ and 2 roots labeled by r, the concerned cost of M is 3(m − 3k)+13k+2.

Hence, it holds that $\tau _{\mathtt {A}}^{u}(T_{1},T_{2}) = \gamma (M) = 4k+9(n-k)+3(m-3k)+13k+2= 9n+3m-k+2$. □

Theorem 11

For A ∈ {SgAln, BotAln}, the problem of computing $\tau _{\mathtt {A}}^{u}(T_{1},T_{2})$ is MAX SNP-hard.

Proof

Consider the algorithm g to transform from a mapping $M\in \mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ between T ₁ and T ₂ to a covering $\mathcal {C}^{\prime }$ of $\mathcal {C}$ as follows:

If M maps $\hat {s}_{i1}$, $\hat {s}_{i2}$ and $\hat {s}_{i3}$ of $\hat {C}_{i}$ in T ₁ to the same three subtrees in T ₂, then add C _i to $\mathcal {C}^{\prime }$.

It holds that n/7 ≤ opt(I) from the proof of Theorem 7 in [21]. Then, by Lemma 5 and since m ≤ 3n and opt(I) ≥ 1, the following inequalities hold.

$$\begin{array}{rcl} \textit{opt}(f(I)) &= &9n+3m-\textit{opt}(I)+2 \le 18n-\textit{opt}(I)+2 \le 127\cdot\textit{opt}(I),\\ s_{2}-\textit{opt}(f(I)) &= &\gamma(M)-\tau_{\mathtt{A}}^{u}(T_{1},T_{2})\\ & \ge & 9n+3m-|\mathcal{C}^{\prime}|+2 -(9n+3m-\textit{opt}(I)+2)\\ & = & \textit{opt}(I)-|\mathcal{C}^{\prime}| = \textit{opt}(I)-s_{1}. \end{array} $$

Hence, (f, g) is an L-reduction from MAX 3SC-3 to this problem. □

Whereas the problem of computing $\tau _{\textsc {Aln}}^{u}(T_{1},T_{2})$ is MAX-SNP hard, it is tractable if the degrees of trees are bounded by some constant [6]. As same as $\tau _{\textsc {Aln}}^{u}(T_{1},T_{2})$, the following theorem also holds.

Theorem 12

The problems of computing $\tau _{\textsc {SgAln}}^{u}(T_{1},T_{2})$ and $\tau _{\textsc {BotAln}}^{u}(T_{1},T_{2})$ are tractable if the degrees of T ₁ and T ₂ are bounded by some constant.

Proof

We can design the algorithm to compute $\tau _{\textsc {SgAln}}^{u}$ by replacing $\tau _{\textsc {Top}}^{o}$ in $\tau _{\textsc {SgAln}}^{o}$ in Fig. 11 with $\tau _{\textsc {Top}}^{u}$ in Fig. 10, and by improving the recurrences of computing $\delta _{\textsc {SgAln}}^{o}$ in Fig. 11 to the recurrences in Fig. 14, which is same as the improvement of the recurrences from computing $\delta _{\textsc {Aln}}^{o}$ to computing $\delta _{\textsc {Aln}}^{u}$ with bounded degrees [6], where A ⊆ T ₁(i) and B ⊆ T ₂(j). Since the degrees of T ₁ and T ₂ are bounded by some constant, the number of combinations of A and B are bounded, so $\delta _{\textsc {SgAln}}^{u}(A,B)$ can be computed in polynomial time.

Furthermore, by replacing diff ^o in the recurrences of computing $\tau _{\textsc {BotAln}}^{o}$ (cf., Theorem 9) with diff ^u and by using the same discussion as above, we can design the recurrences of computing $\delta _{\textsc {BotAln}}^{u}$ in polynomial time. □

5 Conclusion

In this paper, we have characterized a Tai mapping hierarchy as several common subforests, that is, as an embedded subforest, an induced subforest and a complete subforest by focusing on the connections of nodes in a common subforest and as a non-twisting subforest, a parallel subforest, a subtree and a root-preserving subtree by focusing on the arrangements of subtrees in a common subforest. Then, we have introduced new mappings into the Tai mapping hierarchy illustrated in Fig. 2.

Next, we have investigated the metricity of the variations of the edit distance as the minimum cost of the above mappings and the time complexity of computing them. We summarize the results as Table 1 in Section 1. Also Figs. 15 and 16 illustrate the relationship between the Tai mapping hierarchy and time complexity of computing τ(T ₁, T ₂) for $\mathcal {M}_{\mathtt {A}}(T_{1},T_{2})$ as diagram forms.

Notes

The largest common subtree in the standard definition [1, 5, 9, 16, 21] is corresponding to the minimum cost of the LCA-preserving segmental mapping (LcaSg, defined below) which consists of pairs of nodes with the same label.
In contrast to the standard definition of deletion and insertion [9, 14], this paper allows the deletion of a root, in order to characterize the bottom-up distance explicitly. Then, this paper also allows that the resulting tree applying to the deletion becomes a forest.
Whereas Valiente [15] has introduced a bottom-up mapping that requires an isolated-subtree mapping, his algorithm computes a bottom-up distance that is not an isolated-subtree distance. Hence, we adopt the revised definition here not to be an isolated-subtree mapping. See [9, 18].
In the definition of an alignable mapping [9], F is not a forest but a tree, because we can assume that the alignable mapping always contains the pair of the roots of two trees, corresponding to the alignment tree [6]. On the other hand, since the intersection of the alignable mapping to other mappings does not always contain the pair of the roots of two trees, we use a forest F in the definition of an alignable mapping.

References

Akutsu, T., Fukagawa, D., Halldórsson, M.M., Takasu, A., Tanaka, K.: Approximation and parameterized algorithms for common subtrees and edit distance between unordered trees. Theoret. Comput. Sci. 470, 10–22 (2013)
Article MathSciNet MATH Google Scholar
Bille, P.: A survey on tree edit distance and related problems. Theoret. Comput. Sci. 337, 217–239 (2005)
Article MathSciNet MATH Google Scholar
Chawathe, S.S.: Comparing hierarchical data in external memory. Proc. VLDB’99, 90–101 (1999)
Demaine, E.D., Mozes, S., Rossman, B., Weiman, O.: An optimal decomposition algorithm for tree edit distance. ACM Trans. Algo., 6 (2009)
Hirata, K., Yamamoto, Y., Kuboyama, T.: Improved MAX SNP-hard results for finding an edit distance between unordered trees. Proc. CPM 2011. LNCS 6661, 402–415 (2011)
MATH Google Scholar
Jiang, T., Wang, L., Zhang, K.: Alignment of trees – an alternative to tree edit. Theoret. Comput. Sci. 143, 137–148 (1995)
Article MathSciNet MATH Google Scholar
Kan, T., Higuchi, S., Hirata, K.: Segmental mapping and distance for rooted ordered labeled trees. Fundamenta Informaticae 132, 1–23 (2014)
MathSciNet MATH Google Scholar
Kann, V.: Maximum bounded 3-dimensional matching is MAX SNP-complete. Inform. Process. Lett. 37, 27–35 (1991)
Article MathSciNet MATH Google Scholar
Kuboyama, T.: Matching and Learning in Trees. Ph.D thesis, University of Tokyo (2007)
Lu, C.L., Su, Z.-Y., Yang, C.Y.: A new measure of edit distance between labeled trees, Proc. COCOON’01. LNCS 2108, 338–348 (2001)
Google Scholar
Lu, S.-Y.: A tree-to-tree distance and its application to cluster analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1, 219–224 (1979)
Article MATH Google Scholar
Papadimitoriou, C.H., Yannakakis, M.: Optimization, approximation and complexity. J. Comput. System Sci. 43, 425–440 (1991)
Article MathSciNet Google Scholar
Selkow, S.M.: The tree-to-tree editing problem. Inform. Process. Lett. 6, 184–186 (1977)
Article MathSciNet MATH Google Scholar
Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26, 422–433 (1979)
Article MathSciNet MATH Google Scholar
Valiente, G.: An efficient bottom-up distance between trees. Proc. SPIRE’01, 212–219 (2001)
Valiente, G.: Algorithms on Trees and Graphs. Springer (2002)
Wang, J.T.L., Zhang, K.: Finding similar consensus between trees: an algorithm and a distance hierarchy. Pattern Recog. 34, 127–137 (2001)
Article MATH Google Scholar
Yamamoto, Y., Hirara, K., Kuboyama, T.: Tractable and intractable variations of unordered tree edit distance. Internat. J. Found. Comput. Sci. 25, 307–330 (2014)
Article MathSciNet MATH Google Scholar
Zhang, K.: Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recog. 28, 463–474 (1995)
Article Google Scholar
Zhang, K.: A constrained edit distance between unordered labeled trees. Algorithmica 15, 205–222 (1996)
Article MathSciNet MATH Google Scholar
Zhang, K., Jiang, T.: Some MAX SNP-hard results concerning unordered labeled trees. Inform. Process. Lett. 49, 249–254 (1994)
Article MathSciNet MATH Google Scholar
Zhang, K., Wang, J., Shasha, D.: On the editing distance between undirected acyclic graphs. Internat. J. Found. Comput. Sci. 7, 43–58 (1996)
Article MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank Prof. Tetsuji Kuboyama at Gakushuin University, Prof. Kilho Shin at University of Hyogo and Prof. Tetsuhiro Miyahara at Hiroshima City University for fruitful discussion about tree edit distance and its variations. Also they would like to thank anonymous referees of Theory of Computing Systems for valuable comments to revise the submitted version of this paper.

Author information

Authors and Affiliations

Kyushu Institute of Technology, Kawazu 680-4, Iizuka, 820-8502, Japan
Takuya Yoshino & Kouichi Hirata

Authors

Takuya Yoshino
View author publications
You can also search for this author in PubMed Google Scholar
Kouichi Hirata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kouichi Hirata.

Additional information

This work is partially supported by Grant-in-Aid for Scientific Research 16H02870, 16H01743, 15K12102, 26280085 and 24300060 from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoshino, T., Hirata, K. Tai Mapping Hierarchy for Rooted Labeled Trees Through Common Subforest. Theory Comput Syst 60, 759–783 (2017). https://doi.org/10.1007/s00224-016-9705-1

Download citation

Published: 02 September 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s00224-016-9705-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Tai Mapping Hierarchy for Rooted Labeled Trees Through Common Subforest

Abstract

Similar content being viewed by others

Earth Mover’s Distance Between Rooted Labeled Unordered Trees Formulated from Complete Subtrees

Anchored Alignment Problem for Rooted Labeled Trees

Sakura: A Flexible Coding for Tree Hashing

1 Introduction

2 Edit Distance and Tai Mapping

Definition 1

Definition 2

Definition 3

Theorem 1

Definition 4

Definition 5

Example 1

3 Tai Mapping Hierarchy

Definition 6

Example 2

Lemma 1

Theorem 2

Proof

Remark 1

Remark 2

Definition 7

Remark 3

Lemma 2

Lemma 3

Proof

Theorem 3

Proof

4 Time Complexity of Computing Distances

4.1 Tractable Cases

Remark 4

Theorem 4

Proof

Theorem 5

Proof

Theorem 6

Proof

Theorem 7

Proof

Remark

Theorem 8

Proof

Theorem 9

Proof

Theorem 10

Proof

4.2 Intractable Cases

Lemma 4

Remark 6

Lemma 5

Proof

Theorem 11

Proof

Theorem 12

Proof

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation