Keywords

1 Introduction

XPath is the most widely used query language for XML documents; it is an open standard and constitutes a World Wide Web Consortium (W3C) Recommendation [1]. XPath has syntactic operators to navigate the tree using accessibility relations such as ‘child’, ‘parent’, ‘sibling’, et cetera, and can make tests on intermediate nodes. Core-XPath  [2] is the fragment of XPath 1.0 containing only the navigational behavior of XPath. Core-XPath can express properties on nodes with respect to the underlying tree structure of the XML document, such as ‘nodes with label b ’, or ‘nodes that have both a child with label a and a grandchild with label b ’. It can also express properties on paths along the tree such as ‘the ending node is the grandchild of the starting node’, or ‘the initial node has label a and has a child with a, and the ending node is the grandparent of the starting node’. The first type of formulas are evaluated on individual nodes and are called node expressions, while the formulas of the second type are evaluated on pairs of nodes and are called path expressions. However, Core-XPath cannot express conditions on the actual data contained in the attributes, such as with a node expression saying ‘this node has two children with different data values’, or ‘the value of this node coincides with the value of some descendant’. In contrast, Core-Data-XPath  [3] (which we here call simply \(\text {XPath} _{=}\)) can perform these data comparisons. Indeed, \(\text {XPath} _{=}\) is the extension of Core-XPath with the addition of (in)equality tests between attributes of elements in an XML document.

In the paper [4], the expressive power of fragments of \(\text {XPath} _{=}\) was studied, from a logical and model-theoretical point of view, when the set of navegational axes was taken among \(\downarrow , \uparrow \), and the reflexive-transitive closure of those axes. In that work, the semantic universe of study was that of data trees, whose nodes have a single label taken from a finite alphabet and a single data value from an infinite domain. A focus of study in that work was that of bisimulation, which is a classic tool of modal logics, used to determine equivalence between relational models. A node \(x\) of a data tree \(T\) and a node \(x'\) of a data tree \(T'\) are said to be bisimilar if they satisfy some special (depending of the studied fragment) back-and-forth conditions over the structure of the data tree. In [4], suitable notions of bisimulation were deviced for the \(\text {XPath} _{=} \) fragments under considerations. Then, showing a characterization result in the style of the Hennessy-Milner’s theorem for Basic Modal Logic[6], it was proven that if \(x\) and \(x'\) are bisimilar then they satisfy exactly the same node expressions, and that the converse is also true for trees whose every node only has a finite number of children. Hence, bisimulation coincides with logical equivalence, i.e., with indistinguishability by means of node expressions. The paper [4] also stated and proved theorems in the style of van Benthem’s for Basic Modal Logic [7], but in the context of \(\text {XPath} _{=} \). One of these theorems states that the downward fragment of \(\text {XPath} _{=} \) coincides with the bisimulation-invariant fragment of first-order logic with one free variable (over the adequate signature). For the case of the vertical fragment of \(\text {XPath} _{=} \), this characterization fails, but a weaker result is proved instead.

In [5], the study of bisimulation for \(\text {XPath} _{=}\) was expanded in order to encompass bisimulation notions over two-pointed data trees (i.e. a data tree and two specified nodes), giving a bisimulation notion for the downward and the vertical fragment of \(\text {XPath} _{=}\) over two-pointed data trees, and proving the corresponding characterization results between logical equivalence and bisimulation. In this way, the paper expanded the results of [4] from the domain of node expressions to that of path expressions.

In the current work we focus our study in a family of generalizations of \(\text {XPath} _{=} (\downarrow )\), which includes not only the capability of comparing the end nodes of two paths by data (in)equality, but also checking for other types of arbitrary n-ary predicates over nodes at the end nodes of n paths. Given a fixed set \({\mathcal R}\) of relation symbols with their arity, we generalize the concept of data trees to encompass arbitrary relations between nodes, and study the logic \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) over this universe. We give a general suitable bisimulation notion for these \({\mathcal R}\) fragments, and show a Hennessy-Milner-like characterization result connecting this notion with that of logical equivalence. Furthermore, we provide a theorem in the style of van Benthem’s result for Basic Modal Logic, thus characterizing these logics as fragments of first-order logic whose formulas are invariant under this new notion of bisimulation. While we initially state these results for the case of the full universe with no semantic restrictions, in the end we show that restricting ourselves to some universes of data trees preserves our results.

2 Preliminaries

A data tree is a directed tree whose nodes have a single label from a finite alphabet and a single data value from a (possibly infinite) data domain. \(\text {XPath} _{=}({\downarrow })\) is a logic that can express properties about these structures, for instance, we can say if a node has or not certain label or if a node is a child of another via the child axis \(\downarrow \). The most important capabilities of this logic lie in its data tests \(\langle \alpha =\beta \rangle \) or \(\langle \alpha \ne \beta \rangle \), which can compare the data values of two nodes. More precisely, a node satisfying \(\langle \alpha =\beta \rangle \) (resp. \(\langle \alpha \ne \beta \rangle \)) means that there are paths from it such that the first one satisfies \(\alpha \), the second one satisfies \(\beta \) and their final points have equal (resp. non-equal) data value. Having the same data value can be thought of as a binary equivalence relation between nodes, so a natural question that arises is the possibility of extending the types of comparisons that can be made at the end of paths.

Fixed \({\mathcal R}\), \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) is an extension of \(\text {XPath} _=({\downarrow })\) in the sense that now it can allow relations with arbitrary arity (not necessarily binary) between final points of paths from a certain node of a tree. Here, formulas of the type \(\langle \alpha _1,\alpha _2,\dots ,\alpha _n \rangle _r\) express this type of operation, where r represents a particular relation symbol and \({\mathcal R}\) is the set of such symbols. In this context, the data test \(\varphi :=\langle \alpha =\beta \rangle \) can be re-expressed as \(\langle \alpha ,\beta \rangle _{=_d}\) where the sub index \(=_d\) can be interpreted over a data tree as the equivalence relation of having same data value. Similarly, a label test \(\varphi := a\) can be re-expressed as \(\langle \varepsilon \rangle _a\), where a is an unary relation symbol.

Initially, we will consider that the universe of the models for \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) is still that of trees, but extended with an arbitrary relation over nodes for each \(r\in {\mathcal R}\) (with its respective arity). We call these models \({\mathcal R}\text {-trees}\).

Definition 1

Let \({\mathcal R}\) be a finite and non-empty set of relational symbols with arities given by the function \(\mathcal {A}:{\mathcal R}\rightarrow \mathbb {N}_{\ge 1}\). The formulas of \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) are defined by the grammar below:

$$\begin{aligned} \varphi ,\psi&= \varphi \wedge \psi \ | \ \lnot \varphi \ | \ \langle \alpha _1,\alpha _2,\dots ,\alpha _{\mathcal {A}(r)} \rangle _{r} \ | \ \langle \alpha _1,\alpha _2,\dots ,\alpha _{\mathcal {A}(r)} \rangle _{\overline{r}}&r \in {\mathcal R}\\ \alpha ,\beta&= \varepsilon \ | \ \downarrow \ | \ \alpha \beta \ | \ \alpha \cup \beta \ | \ [\varphi ] \end{aligned}$$

The first row generates the node expressions, and the second one, the path expressions: \(\langle \alpha _1,\alpha _2,\dots ,\alpha _{\mathcal {A}(r)} \rangle _{r}\) and \(\langle \alpha _1,\alpha _2,\dots ,\alpha _{\mathcal {A}(r)} \rangle _{\overline{r}}\) are called path tests where \(\alpha _i\) is a path expression for all \(i\in \{1,2,\dots ,\mathcal {A}(r)\}\); \(\alpha \beta \) and \(\alpha \cup \beta \) are respectively the concatenation and union between \(\alpha \) and \(\beta \); \([\varphi ]\) are node tests (as part of a path expression) where \(\varphi \) is a node expression; and the symbols \(\varepsilon \) and \(\downarrow \) are the self and child axes, respectively.

As usual we use \(\varphi \vee \psi \) as a shorthand for \(\lnot (\lnot \varphi \wedge \lnot \psi )\). By \(\boldsymbol{\overline{{\mathcal R}}}\) we indicate the set of the complement symbols \(\overline{r}\) with \(r\in {\mathcal R}\) and extend \(\mathcal {A}\) to \({\mathcal R}\cup \overline{{\mathcal R}}\) as \(\boldsymbol{\mathcal {A}(\overline{r}):=\mathcal {A}(r)}\) for \(r\in {\mathcal R}\).

From now on, we consider a fixed \({\mathcal R}\ne \emptyset \), a family of predicate symbols as in Definition 1.

Definition 2

An \(\boldsymbol{{\mathcal R}\text {-tree}}\) \(\mathcal {T}=\langle T, E_\downarrow , \{R_r\}_{r\in {\mathcal R}} \rangle \) is a set T with a binary relation \(E_\downarrow \subseteq T^2\) such that \(\langle T,E_\downarrow \rangle \) is a rooted tree, and with a family of relations \(\{R_r\}_{r\in {\mathcal R}}\), where \(R_r\subseteq T^{\mathcal {A}(r)}\). We use \(R_{\overline{r}}\) to abbreviate the complement of \(R_r\), that is, \(\boldsymbol{R_{\overline{r}}:=T^{\mathcal {A}(r)}\setminus R_r}\). If \(x,y\in T\), the pair \((\mathcal {T},x)\) is called a pointed model, and \((\mathcal {T},x,y)\) is called a two-pointed model.

Note 1

To simplify the notation when there is no risk of confusion, we will often simply use r to refer to the semantics \(R_r\) corresponding to the \({\mathcal R}\)-tree currently in discussion.

In some cases when a is an unary symbol, we might simply write a instead of the node expression \(\langle \varepsilon \rangle _a\).

Example 1

Let us consider \(\mathcal {T}=\langle T, E_\downarrow , \{a, b, c, =_n, =_{w}\} \rangle \) a representation of a very simple and brief bibliographical database as in Fig. 1. Here abc are unary predicates (representing in \(\mathcal {T}\) that a node is an author, book, or chapter, respectively. \(=_n, =_{w}\) are binary predicates, which express in \(\mathcal {T}\) when two nodes have the same numerical value for \(=_n\), or the same word for \(=_{w}\).

Fig. 1.
figure 1

A representation of a bibliographical database as in Example 1, via a tree whose nodes contain both a numeric and lexical value and where nodes can belong to any of three types of unary relations ((a)uthor), (b)ook, (c)hapter).

Definition 3

We now give the interpretation of the symbols from Definition 1 over an \({\mathcal R}\text {-tree}\) \(\mathcal {T}=\langle T, E_\downarrow , \{R_r\}_{r\in {\mathcal R}} \rangle \).

If \(\varphi \) is a node expression and \(\alpha \) a path expression, then we write \((\mathcal {T},x)\,\models \,\varphi \) iff \(x\in [\![{\varphi }]\!]_{\mathcal {T}}\) and \((\mathcal {T},x,y)\,\models \,\alpha \) iff \((x,y)\in [\![{\alpha }]\!]_{\mathcal {T}}\).

Remark 1

Note that we can express the node property \(\langle \alpha \rangle \), which indicates that from a node it is possible to descend to some node via the path expression \(\alpha \). Indeed, we can see that for any \(r \in {\mathcal R}\), the following formula expresses the desired property: \(\langle \alpha , \dots , \alpha \rangle _r \vee \langle \alpha , \dots , \alpha \rangle _{\overline{r}}\).

We can also define a node expression that is true on any node, irrespective of the semantics of the tree: take any \(r \in {\mathcal R}\) and define \(\top := \langle \varepsilon , \dots , \varepsilon \rangle _r \vee \langle \varepsilon , \dots , \varepsilon \rangle _{\overline{r}}\).

Example 2

From the root \(t_0 \) of the database represented in Fig. 1, we could ask whether there is an author having the same name as some book title. The answer to this depends on whether \((\mathcal {T},t_0) \,\models \, \langle \downarrow [\langle \varepsilon \rangle _a], \downarrow \downarrow [\langle \varepsilon \rangle _b] \rangle _{=_w}\). From what we can see in the graphical representation, this does not happen, as we can see that \([\![{\langle \downarrow [\langle \varepsilon \rangle _a], \downarrow \downarrow [\langle \varepsilon \rangle _b] \rangle _{=_w}}]\!]_{\mathcal {T}} = \emptyset \).

Note that here (as in some types of data logics) asking whether there are two different nodes with the same data value is in general not possible. For example, \([\![{a \wedge \langle \downarrow , \downarrow \rangle _{=_n}}]\!]_{\mathcal {T}}\) (see Note 1) will contain any node that corresponds to an author that has written a book, since there is no guarantee that the two witnesses of \(\downarrow \) are different. Indeed, in the figure \([\![{\langle \varepsilon \rangle _a \wedge \langle \downarrow , \downarrow \rangle _{=_n}}]\!]_{\mathcal {T}}\) would contain the node of Alexandre Dumas, even though it has only one book child.

Example 3

Consider an extension of \({\mathcal T}\) and \({\mathcal R}\) including the binary predicate \(=^d_{n}\), such that over \({\mathcal T}\), \(x=^d_{n} y\) iff (\(x=_{n} y\) and \(x\ne y\)). Now we can express in \({\mathcal T}\) properties such as ‘this author has two different books with the same number of pages’, via \(\varphi : a \wedge \langle \downarrow ,\downarrow \rangle _{=^d_{n}}\).

Definition 4

Two expressions \(\eta _1\) and \(\eta _2\) of the same type (node or path expression) are said to be semantically equivalent, written \(\boldsymbol{\eta _1 \equiv \eta _2}\) if for all \(\mathcal {T}\) we have that \([\![{\eta _1}]\!]_{\mathcal {T}}=[\![{\eta _2}]\!]_{\mathcal {T}}\).

Remark 2

It easy to see that the semantic equivalence is preserved by any syntactic construction. That is, negating two equivalent node expression results in equivalent node expressions, concatenating a path expression to two different but equivalent path expressions results in two equivalent path expressions, etc.

Remark 3

Let \(\overline{\gamma }=\gamma _1,\dots ,\gamma _k\) and \(\overline{\gamma }'=\gamma _{k+1},\dots ,\gamma _n\) be finite (and potentially empty) sequences of path expressions. For all \(\alpha \) and \(\beta \) path expressions and for all \(*\in {\mathcal R}\cup \overline{{\mathcal R}}\) with \(\mathcal {A}{(*)}=n+1\) we have that

$$ \langle \overline{\gamma },\alpha \cup \beta ,\overline{\gamma }' \rangle _*\equiv \langle \overline{\gamma },\alpha ,\overline{\gamma }' \rangle _*\vee \langle \overline{\gamma },\beta ,\overline{\gamma }' \rangle _* $$

Guided by this fact, we will re-define \(\mathbf {XPath}_{\boldsymbol{{\mathcal R}}}\boldsymbol{(\downarrow )}\) as the fragment of the original one (Definition 1) where we do not include the rule \(\alpha \cup \beta \) in the grammar. That is, the fragment whose path expressions do not have the symbol \(\cup \) in their syntax. Even though this union-free fragment is less expressive when considering both path and node expressions, the expressive power remains the same as the full fragment when considering only pointed models and node expressions, because of the semantic equivalence given in Remark 3.

We now give the definition of direct path expressions, which are path expressions without unnecessary concatenations of symbols. This definition is used later to define normal expressions, whose purpose is to simplify the proofs by induction.

Definition 5

A direct path expression \(\alpha \) is a path expression of the form \(\alpha =\varepsilon \) or \(\alpha =\downarrow \xi _1\downarrow \dots \downarrow \xi _n\) where each \(\xi _i\) is an empty string or a node test.

A normal expression is one with all the path expressions in its path tests being direct. We will formally define this idea by means the operator \(\text {sub}(-)\).

Definition 6

For a formula \(\eta \) we denote by \(\mathbf{sub} {\boldsymbol{(\eta )}}\) the set defined recursively as follows:

$$\begin{aligned}&\text {sub}(\varepsilon )=\text {sub}(\downarrow ):=\emptyset \quad \text {sub}(\alpha \beta ):=\text {sub}(\alpha )\cup \text {sub}(\beta )\\&\text {sub}(\lnot \varphi )=\text {sub}([\varphi ]):=\text {sub}(\varphi ) \quad \text {sub}(\varphi \wedge \psi ):=\text {sub}(\varphi )\cup \text {sub}(\psi )\\&\text {sub}(\langle \alpha _1,\alpha _2,\dots ,\alpha _n \rangle _{*}):=\{\alpha _1,\alpha _2,\dots ,\alpha _n\} \cup \bigcup _{i=1}^{n}\text {sub}(\alpha _i) \quad \text {for }*\in {\mathcal R}\cup \overline{{\mathcal R}}\end{aligned}$$

Definition 7

A formula \(\eta \) is a normal expression if all the path expressions in \(\text {sub}(\eta )\cup \{\eta \}\) are direct.

The downward depth of an expression measures the maximum depth from the current point of evaluation that the formula could potentially ‘see’. The idea is that, when analysing such an expression over a particular point or pair of points, nodes that are further down than this depth have no effect on the resulting truth value of the expression.

Definition 8

The downward depth of \(\eta \) denoted by \(\mathbf{dd} {\boldsymbol{(\eta )}}\) is the number defined as follows:

$$\begin{aligned}&\text {dd}(\lnot \varphi ):=\text {dd}(\varphi ) \quad \text {dd}(\varphi \wedge \psi ):=\max \{\text {dd}(\varphi ),\text {dd}(\psi )\} \\&\text {dd}(\lambda ) := 0 \quad \text {where }\lambda \text { represents the empty string} \\&\text {dd}(\varepsilon \beta ) := \text {dd}(\beta ) \quad \text {dd}([\varphi ]\beta ) := \max \{\text {dd}(\varphi ),\text {dd}(\beta )\} \quad \text {dd}(\downarrow \beta ) := 1+\text {dd}(\beta ) \\&\text {dd}(\langle \alpha _1,\alpha _2,\dots ,\alpha _n \rangle _{*}):= \max \{\text {dd}(\alpha _1),\text {dd}(\alpha _2),\dots ,\text {dd}(\alpha _n)\} \quad \text { for } * \in {\mathcal R}\cup \overline{{\mathcal R}}\end{aligned}$$

The set of all formulas with downward depth less than or equal to \(\ell \ge 0\) is written as \(\mathbf {XPath}_{\boldsymbol{{\mathcal R}}}\boldsymbol{(\downarrow )}\).

Note that this definition encompasses all (union-free) formulas in \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) and the function \(\text {dd}(-)\) is well-defined.

Remark 4

Let \(\overline{\gamma }=\gamma _1,\dots ,\gamma _k\) and \(\overline{\gamma }'=\gamma _{k+1},\dots ,\gamma _n\) be finite (and potentially empty) sequences of path expressions. The following semantic equivalences also preserve the downward depth.

  1. 1.

    \(\langle \overline{\gamma },[\varphi ]\alpha ,\overline{\gamma }' \rangle _{*}\equiv \varphi \wedge \langle \overline{\gamma },\alpha ,\overline{\gamma }' \rangle _{*}\)       \(*\in {\mathcal R}\cup \overline{{\mathcal R}}\) with \(\mathcal {A}(*) = n+1\)

  2. 2.

    \(\varepsilon \alpha \equiv \alpha \)

  3. 3.

    \([\varphi ][\psi ]\equiv [\varphi \wedge \psi ]\)

Proposition 1

For every formula \(\eta \) we have that

  1. i)

    If \(\eta \) is a node expression then there exists a normal node expression \(\eta '\) with \(\text {dd}(\eta )=\text {dd}(\eta ')\) and \(\eta \equiv \eta '\).

  2. ii)

    If \(\eta \) is a path expression then there exists a normal path expression \(\eta '\) with \(\text {dd}(\eta )=\text {dd}(\eta ')\) and \(\eta \equiv \eta '\).

Proof

One can easily prove the statement by syntactic induction over \(\eta \) and making use of the semantic equivalences from Remark 4.

3 Bisimulation and Equivalence

The classic Hennessy-Milner’s characterization theorem [6] for Basic Modal Logic establishes the relation between two notions: logical equivalence and bisimilarity. In our case, the former notion indicates when a pair of pointed models are indistinguishable by means of node expressions. The latter intuitively ensures that for each selection of paths in one of the models, there are copies in the other, preserving the possible relational properties between their respective final points.

Definition 9

Let \(\ell \ge 0\). Given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) we say that they are \(\boldsymbol{{{\mathcal R}}_{\ell }}\)-logically equivalent and denote it by \(\boldsymbol{(\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')}\) if for all node expression \(\varphi \in \text {XPath}^{\ell }_{{\mathcal R}}(\downarrow ){}\) we have that \((\mathcal {T},x)\,\models \,\varphi \), if and only if, \((\mathcal {T}',x')\,\models \,\varphi \).

\((\mathcal {T},x)\) and \((\mathcal {T}',x')\) are \(\boldsymbol{{{\mathcal R}}}\)-logically equivalent, written \(\boldsymbol{(\mathcal {T},x)\equiv ^{{\mathcal R}}(\mathcal {T}',x')}\), if \(\boldsymbol{(\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')}\) for all \(\ell \ge 0\). In other words, if for any node expression \(\varphi \), \((\mathcal {T},x)\,\models \,\varphi \), if and only if, \((\mathcal {T}',x')\,\models \,\varphi \).

Definition 10

A path \(\mu \) in a tree \(\mathcal {T}\) is a sequence \(\mu =\mu _0\downarrow \mu _1\downarrow \dots \downarrow \mu _n\) where \(n\ge 0\), \(\mu _i\in T\) for all \(i\in \{0,\dots , n\}\) and \(E_\downarrow (\mu _i,\mu _{i+1})\) for all \(i\in \{0,1,\dots ,n-1\}\). The length \(\mathbf{len} \boldsymbol{(\mu ) :=n}\) is the number of symbols \(\downarrow \) in \(\mu \). The i-th node of \(\mu \) is \(\boldsymbol{[\mu ]_i := \mu _i}\) and \(\mathbf{end} \boldsymbol{{(\mu )}:=[\mu ]}_\mathbf{len} \boldsymbol{(\mu )}\) is the final node of \(\mu \).

We denote by \(\mathbf{Path} \boldsymbol{(\mathcal {T})}\) the set of all paths \(\mu \) in \(\mathcal {T}\). For a node \(x\in \mathcal {T}\), \(\mathbf{Path} \boldsymbol{(\mathcal {T},x)}\) are the paths \(\mu \) starting from the node x, i.e., \([\mu ]_0=x\) and by \(\mathbf{Path} _{\boldsymbol{k}}\boldsymbol{(\mathcal {T},x)}\) we refer to the subset of paths in \(\text {Path}_{}{(\mathcal {T},x)}\) with length at most k.

The concatenation \(\boldsymbol{\mu \odot \nu }\) of two paths \(\mu ,\nu \in \text {Path}_{}{(\mathcal {T})}\) such that \(\nu _0 = \text {end}(\mu )\) is defined as \([\mu \odot \nu ]_i := [\mu ]_i\) for all \(i\in \{0,\dots ,\text {len}(\mu )\}\) and \([\mu \odot \nu ]_{\text {len}(\mu )+i} := [\nu ]_i\) for all \(i\in \{0,\dots ,\text {len}(\nu )\}\).

Definition 11

Given a \({\mathcal R}\text {-tree}\) \(\mathcal {T}\) and a node \(x\in T\), \(\boldsymbol{\mathcal {T}|_\ell ^x}\) is the \({\mathcal R}\text {-tree}\) whose underlying tree is the set of nodes \(y\in T\) for which there exists a path \(\mu \in \text {Path}_{\ell }{(\mathcal {T},x)}\) with \(\text {end}(\mu )= y\) (note that x is the root of such tree).

Remark 5

For all \(\ell \ge 0\) and \({\mathcal R}\text {-tree}\) \(\mathcal {T}\), we have that \((\mathcal {T}|_\ell ^x, x)\equiv ^{{\mathcal R}}_\ell (\mathcal {T},x)\).

Definition 12

Let \(\mathcal {T}\) and \(\mathcal {T}'\) be \({\mathcal R}\text {-trees}\). An \(\boldsymbol{{{\mathcal R}}_{\ell }}\)-bisimulation \(\mathcal {Z}=\{Z_k\}_{0\le k\le \ell }\) is a family of relations \(Z_k\subseteq T\times T'\) such that for all k, for all \((x,x')\in Z_k\), and for all \(n\in \text {Im}(\mathcal {A})\), the clauses below hold.

  • Zig For every selection of paths \(\mu _1,\mu _2,\dots ,\mu _n\in \text {Path}_{k}{(\mathcal {T},x)}\), there exist paths \({\mu '_1},{\mu '_2},\dots ,{\mu '_n}\in \text {Path}_{k}{(\mathcal {T}',x')}\) such that for all \(j\in \{1,\dots ,n\}\) and for all \(r\in {\mathcal R}\) we have that:

    1. i)

      \(\text {len}(\mu _j)=\text {len}({\mu '_j})\)

    2. ii)

      \(([\mu _{j}]_{i}, [{\mu '_j}]_{i}) \in Z_{k-i} \ \forall i\in \{0,\dots , \text {len}(\mu _j) \}\)

    3. iii)

      \(R_r(\text {end}(\mu _1),\text {end}(\mu _2),\dots ,\text {end}(\mu _n)) \Leftrightarrow R'_r(\text {end}({\mu '_1}),\text {end}({\mu '_2}),\dots ,\text {end}({\mu '_n}))\)

  • Zag For every selection of paths \({\mu '_1},{\mu '_2},\dots ,{\mu '_n}\in \text {Path}_{k}{(\mathcal {T}',x')}\) there exist paths \(\mu _1,\mu _2,\dots ,\mu _n\in \text {Path}_{k}{(\mathcal {T},x)}\) such that for all \(j\in \{1,\dots ,n\}\) and for all \(r\in {\mathcal R}\) we have that:

    1. i)

      \(\text {len}(\mu _j)=\text {len}({\mu '_j})\)

    2. ii)

      \(([\mu _{j}]_{i}, [{\mu '_j}]_{i}) \in Z_{k-i} \ \forall i\in \{0,\dots , \text {len}(\mu _j) \}\)

    3. iii)

      \(R_r(\text {end}(\mu _1),\text {end}(\mu _2),\dots ,\text {end}(\mu _n)) \Leftrightarrow R'_r(\text {end}({\mu '_1}),\text {end}({\mu '_2}),\dots ,\text {end}({\mu '_n}))\)

Two pointed models \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) are said to be \(\boldsymbol{{{\mathcal R}}_{\ell }}\)-bisimilar, denoted \(\boldsymbol{(\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')}\), if there exists an \({{\mathcal R}}_{\ell }\)-bisimulation \(\mathcal {Z}=\{Z_i\}_{0\le i\le \ell }\) such that \((x,x')\in Z_{\ell }\).

If \(Z\subseteq T\times T'\) is a relation such that for all \(\ell \ge 0\) the family \(\{Z_i \ | \ Z_i=Z\}_{0\le i\le \ell }\) is a \({{\mathcal R}}_{\ell }\)-bisimulation then we call Z an \(\boldsymbol{{{\mathcal R}}}\)-bisimulation. If there exists a \({{\mathcal R}}\)-bisimulation Z with \((x,x')\in Z\) then \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) are \(\boldsymbol{{{\mathcal R}}}\)-bisimilar, written as \(\boldsymbol{(\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}(\mathcal {T}',x')}\).

Remark 6

It is useful to observe that, when there are predicates of arity 2 or greater, the Zig (and Zag) conditions can be replaced in the case of unary predicates with a simpler ‘Harmony’ condition in the style of bisimulation for modal logics and \(\text {XPath} _{=} \): for any unary predicate u it is enough to check that whenever \(xZ x'\), then \(u(x)\) iff \(u(x')\). Note, however, that this replacement cannot be done if we only have unary predicates in \({\mathcal R}\), since doing so would remove all Zig and Zag conditions, and thus we would not be comparing any topological information about the models.

Remark 7

Using the Remark 6, we can see that our Definition 12 for the concept of \({\mathcal R}\)-bisimulation generalizes the definition of bisimulation for \(\text {XPath} _{=} (\downarrow )\) over the universe of data trees from [4]. It does so by taking \({\mathcal R}= \{=_d\} \cup \mathbb {A}\), where \(=_d\) is a binary symbol (interpreted as data equality over data trees) and the finite symbols in the label set \(\mathbb {A}\) are unary predicates. See also Theorem 4 and the discussion preceding it.

Remark 8

The notion of \({{\mathcal R}}\)-bisimilarity given in Definition 12 does not coincide with bisimilarity for multi-relational Kripke models (where \(\downarrow \) and each relation from \({\mathcal R}\) get their own modal operator, and where we translate from \({\mathcal R}\text {-trees}\) into Kripke models). Indeed, consider the case where \({\mathcal R}\) consists solely of a binary relation \(=_d\) (we will represent the semantics of \(=_d\) with numeric data values). For an example of two pointed models that are modally bisimilar but not \({{\mathcal R}}\)-bisimilar, consider: on one hand the infinite linear tree \((\mathcal {T},x)\), where the root x has data value 1, the sole next child has data value 2, the next one has data value 1, and so on alternating between these two values (i.e. \(1,2,1,2, \dots \)); on the other hand, take the infinite linear tree \((\mathcal {T}',x')\) that has data value 1 in all nodes (i.e. \(1,1,1,1, \dots \)).

Intuitively, while modal bisimulation appears to have greater navigational freedom by being able to move with any modality from \({\mathcal R}\), when doing that it cannot keep track of the actual topology of the model (given by \(\downarrow )\) nor can it ask whether the endpoints of paths are related via an \(r \in {\mathcal R}\).

Example 4

Let \({\mathcal R}= \{b, f, S\}\), where bf are unary predicate symbols and S is a ternary predicate symbol. We consider the \({\mathcal R}\text {-trees}\) \(\mathcal {T}, \mathcal {T}'\) from Fig. 2, where a node in b is represented as having a red border, and a node satisfying f is represented as being filled with the color black. In both trees, S has an interpretation related to the numbers: \(S(x, y, z)\) iff \(d(x) = d(y) + d(z)\), where d(w) represents the number drawn on the node w. If we call \(t_0, t_0 '\) the respective roots of both trees, it is easy to verify that \((\mathcal {T}, t_0)\) and \((\mathcal {T}', t_0 ')\) are \({\mathcal R}\)-bisimilar via the represented Z relation. It is important to note that it does not matter that \(\mathcal {T}\) represents the values of nodes in integers and \(\mathcal {T}'\) does so with rational numbers: the only thing that matters is the semantics of each predicate in \({\mathcal R}\), as the logic itself has no way of ‘seeing’ these particular values (and indeed some possible semantics of S cannot be expressed in this form).

Fig. 2.
figure 2

A representation of two \({\mathcal R}\text {-trees}\) \(\mathcal {T}\) and \(\mathcal {T}'\) for \({\mathcal R}= \{b,f,S\}\), as in Example 4. Also represented in the figure is a Zig step and, via a dotted line connecting nodes, a \(\{b,f,S\}\)-bisimulation Z between \((\mathcal {T}, t_0)\) and \((\mathcal {T}', t_0 ')\).

As we have for modal logics, the bounded notions \({\mathcal R}_\ell \)-logical equivalence and \({\mathcal R}_\ell \)-bisimulation coincide. That is, two pointed models are \({\mathcal R}_\ell \)-logically equivalent, if and only if, they are \({\mathcal R}_\ell \)-bisimilar.

The proof of each one of the following statements can be found in the Appendix.

Lemma 1

For every \(\ell \ge 0\), there are finitely many equivalence classes (modulo semantic equivalence) of node expressions with downward depth at most \(\ell \).

Corollary 1

For each \(\ell \ge 0\) and a pointed model \((\mathcal {T},x)\) there is a node expression \(\chi ^{\ell }_{(\mathcal {T},x)}\in \text {XPath}^{\ell }_{{\mathcal R}}(\downarrow ){}\) satisfying that for any pointed model \((\mathcal {T}',x')\), \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\) iff \((\mathcal {T}',x')\,\models \,\chi ^{\ell }_{(\mathcal {T},x)}\).

Proposition 2

Given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) we have that if \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\) then \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\).

Proposition 3

Given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) we have that if \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\) then \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\).

We can unify Proposition 2 and Proposition 3 in the following statement:

Theorem 1

Given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) we have that \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\), if and only if, \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell } (\mathcal {T}',x')\).

4 Characterizations

In this section we state and prove characterization results for \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\), in the style of Hennessy-Milner’s theorem [6] for Basic Modal Logic. Our result states that \({\mathcal R}\)-logical equivalence and \({\mathcal R}\)-bisimulation agree over models whose underlying tree is finitely branching. After giving this result, we will treat \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) as a fragment of the first order logic in order to show a characterization result in the style of van Benthem’s theorem [7] for Basic Modal Logic, concluding that the formulas of \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) are exactly those of the first order logic (with certain signature) which are preserved by \({\mathcal R}\)-bisimulation.

Proposition 4

Given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) we have that if \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}} (\mathcal {T}',x')\) then \((\mathcal {T},x)\equiv ^{{\mathcal R}} (\mathcal {T}',x')\).

Proof

Suppose \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}} (\mathcal {T}',x')\) via \((x,x')\in Z\subseteq T\times T'\). For every \(\ell \ge 0\), the family \(\mathcal {Z}_\ell :=\{Z_0=Z_1=\dots =Z_\ell := Z\}\) is a \({\mathcal R}_\ell \)-bisimulation between \((\mathcal {T},x)\) and \((\mathcal {T}',x')\). Thus, by Theorem 1, for every \(\ell \ge 0\) we have that \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell }\). In other words, \((\mathcal {T},x)\equiv ^{{\mathcal R}}(\mathcal {T}',x')\).

Definition 13

A pointed model \((\mathcal {T},x)\) is finitely branching if for all \(k\ge 0\) the set \(\text {Path}_{k}{(\mathcal {T},x)}\) is finite.

Theorem 2 (Hennessy-Milner’s style characterization)

If \((\mathcal {T},x)\) and \((\mathcal {T},x')\) are finitely branching, then \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}} (\mathcal {T}',x')\) if and only if \((\mathcal {T},x)\equiv ^{{\mathcal R}} (\mathcal {T}',x')\).

Proof

  • (\(\Rightarrow \)) By Proposition 4.

  • (\(\Leftarrow \)) Consider the relation \(Z\subseteq T\times T'\), which is defined such that \((z,z')\in Z\) iff \((\mathcal {T},z)\equiv ^{{\mathcal R}}(\mathcal {T}',z')\). The proof to show that Z is a \({\mathcal R}\)-bisimulation is analogous to that which was given for Proposition 3 in order to see that \(\mathcal {Z}\) was a \({\mathcal R}_\ell \)-bisimulation. Now, we define \({\varPhi }^{(j,i)} :=\bigwedge _{\overline{\mu }\in P} \varphi _{\overline{\mu }}^{(j,i)}\) since the set P is finite because \((\mathcal {T}',x')\) is finitely branching.

Remark 9

The classical counterexample for the modal logic with only one modality still works to see that the finitely branching hypothesis is required: take on one side an infinitely branching tree constructed with one branch of each finite length; on the other side, a copy of the previous tree with the addition of an infinitely long linear branch hanging from the root. For every \(r \in {\mathcal R}\), we set in both trees that \(R_r = \emptyset \). It can be seen that both \({\mathcal R}\text {-trees}\) with their roots are logically equivalent but that any proposed Z fails to be a bisimulation.

From now on, we focus on \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) as a fragment of the first-order logic \(\text {FO}^{}(\sigma _{\mathcal R})\) with the natural signature \(\sigma _{\mathcal R}\), and with a standard translation \(\text {ST}\left( -\right) \) between formulas of \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) and of first-order logic. For details, see the corresponding Definitions 16 and 17 in the Appendix.

The following lemmas are the key for proving the van Benthem’s characterization style theorem. Before stating them, we need to give some definitions:

Definition 14

Let \(\ell \ge 0\) and \(\psi (u),\varphi (u)\in \text {FO}^{}(\sigma _{\mathcal R})\) with one free variable u.

  • \(\psi (u)\) is \(\boldsymbol{\ell }\)-local if for all pointed model \((\mathcal {T},x)\), \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) iff \(\mathcal {T}|_\ell ^x \,\models \, \psi [u\mapsto x]\).

  • \(\psi (u)\) is \(\boldsymbol{\leftrightharpoons ^{{\mathcal R}}}\)-invariant if for all \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) such that \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) and \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}(\mathcal {T}',x')\) then \(\mathcal {T}'\,\models \,\psi [u\mapsto x']\).

  • \(\psi (u)\) is \(\boldsymbol{\leftrightharpoons ^{{\mathcal R}}_{\ell }}\)-invariant if for all \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) such that \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) and \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\) then \(\mathcal {T}'\,\models \,\psi [u\mapsto x']\).

  • \(\psi (u)\) and \(\varphi (u)\) are semantically equivalent over trees \(\boldsymbol{\psi (u)\overset{trees}{\equiv }\varphi (u)}\) if for all \((\mathcal {T},x)\) we have that \(\mathcal {T}\,\models \,\psi [u\mapsto x]\) iff \(\mathcal {T}\,\models \,\varphi [u\mapsto x]\).

  • A pointed model \((\mathcal {T},x)\) has depth at most \(\boldsymbol{\ell }\) if \(\text {Path}_{\ell }{(\mathcal {T},x)}=\text {Path}_{n}{(\mathcal {T},x)}\) for all \(n\ge \ell \).

Lemma 2

For each \(\leftrightharpoons ^{{\mathcal R}}\)-invariant \(\psi (u)\in \text {FO}^{}(\sigma _{\mathcal R})\) with one free variable u, there is \(\ell \ge 0\) such that \(\psi \) is \(\ell \)-local.

Proof

See Appendix.

Lemma 3

Suppose \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) have depth at most \(\ell \). Then \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}} (\mathcal {T}',x')\) if and only if \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\). Equivalently, \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}} (\mathcal {T}',x')\), if and only if, \((\mathcal {T},x)\equiv ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\).

Proof

The left-to-right direction is clear. For the other direction, suppose that we have \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\) via \(\mathcal {Z}=\{Z_k\}_{0\le k\le \ell }\). Then we have that \(Z:=\bigcup _{k=0}^{\ell } Z_k\) is a \({\mathcal R}\)-bisimulation between \((\mathcal {T},x)\) and \((\mathcal {T}',x')\).

Theorem 3 (van Benthem’s style characterization)

Let \(\psi (u)\) be a formula in \(\text {FO}^{}(\sigma _{\mathcal R})\) with one free variable u. The following statements are equivalent:

  1. 1.

    \(\psi (u)\) is \(\leftrightharpoons ^{{\mathcal R}}\)-invariant.

  2. 2.

    There exists a node expression \(\varphi \) such that \(\psi (u)\overset{trees}{\equiv }\text {ST}\left( \varphi \right) (u)\)

Proof

  • (1\(\Rightarrow \)2) Suppose \(\psi (u)\) is \(\leftrightharpoons ^{{\mathcal R}}\)-invariant. By Lemma 2 we know that there is some \(\ell \ge 0\) for which \(\psi (u)\) is \(\ell \)-local. Let us see that \(\psi (u)\) is \(\leftrightharpoons ^{{\mathcal R}}_{\ell }\)-invariant for such \(\ell \): given \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) such that \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) and \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\) we want to show that \(\mathcal {T}'\,\models \,\psi [u\mapsto x']\). On one side, by Remark 5, \((\mathcal {T}|_{\ell }^x,x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}',x')\leftrightharpoons ^{{\mathcal R}}_{\ell }(\mathcal {T}'|_{\ell }^{x'},x')\). On the other side, by Lemma 3, \((\mathcal {T}|_{\ell }^x,x)\equiv ^{{\mathcal R}}(\mathcal {T}'|_{\ell }^{x'},x')\). Since \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) iff \(\mathcal {T}|_{\ell }^{x} \,\models \, \psi [u\mapsto x]\), we conclude that \(\mathcal {T}'|_{\ell }^{x'} \,\models \, \psi [u\mapsto x']\) iff \(\mathcal {T}'\,\models \,\psi [u\mapsto x']\). From this, it is clear that \(\varphi (u)\overset{trees}{\equiv }\text {ST}\left( \bigvee _{\mathcal {T}\,\models \,\varphi [u\mapsto x]} \chi ^{\ell }_{(\mathcal {T},x)}\right) (u)\) where \(\chi ^{\ell }_{(\mathcal {T},x)}\) is given by Corollary 1.

  • (2\(\Rightarrow \)1) Suppose \(\psi (u)\overset{trees}{\equiv }\text {ST}\left( \varphi \right) (u)\). Using Proposition 4, we can see that \(\text {ST}\left( \varphi \right) (u)\) is \(\leftrightharpoons ^{{\mathcal R}}\)-invariant. Indeed, if \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}(\mathcal {T}',x')\), then, by Proposition 4, \((\mathcal {T},x)\equiv ^{{\mathcal R}}(\mathcal {T}',x')\); also if \(\mathcal {T}\,\models \,\text {ST}\left( \varphi \right) [u\mapsto x]\), because \(\text {ST}\left( -\right) \) is truth-preserving then \((\mathcal {T},x)\,\models \,\varphi \). Hence, since \((\mathcal {T},x)\equiv ^{{\mathcal R}}(\mathcal {T}',x')\), we have that \((\mathcal {T}',x')\,\models \,\varphi \) which is equivalent to \(\mathcal {T}'\,\models \,\text {ST}\left( \varphi \right) [u\mapsto x']\). So \(\psi (u)\) is indeed \(\leftrightharpoons ^{{\mathcal R}}\)-invariant.

So far, we have considered that \({\mathcal R}\) had no fixed semantics on the universe, but for many cases it is reasonable to consider some restrictions, such that a particular binary relation is an equivalence relation, which is a reasonable assumption if we want to talk about \({\mathcal R}\text {-trees}\) as a generalization of data trees.

Let \(\mathbb {A}\) be a subset of unary symbols from \({\mathcal R}\), and let X, S, and T be three subsets of binary symbols from \({\mathcal R}\). We denote by \(\boldsymbol{\mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}}\) the class of \({\mathcal R}\text {-trees}\) \(\mathcal {T}\) satisfying that for every node \(x\in T\) there is a unique symbol \(a\in \mathbb {A}\) such that \(x\in R_a\), and for all \(r\in {\mathcal R}\) the relation \(R_r\) is reflexive if \(r\in X\), symmetric if \(r\in S\) and transitive if \(r\in T\). It can be seen that each formula \(\psi (u)\) in \(\text {FO}^{}(\sigma _{{\mathcal R}})\) that is invariant by bisimulation between structures in \(\mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\) will be semantically equivalent (relative to \(\mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\)) to the translation of a node expression in \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\). The main reason is that the \({\mathcal R}\text {-trees}\) \(\mathcal {T}_A\) and \(\mathcal {T}_B\) constructed in the proof of Lemma 2 are now in the class \(\mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\), since each binary \(\tilde{R}_r\) inherits good properties from \(R_r\) as reflexivity, symmetry and transitivity (for the fresh nodes \(t_A\) and \(t_B\), we can freely assign them to any single unary set \(R_a\) with \(a\in \mathbb {A}\)). The invariance of \(\psi (u)\) between structures in this new class allows us to finish the proof. Thus, we can relativize van Benthem’s characterization to \(\mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\). This gives us a generalization to the van Benthem’s characterization for \(\text {XPath} _{=}(\downarrow )\), where the symbols in \(\mathbb {A}\) are the labels and there is a symbol \(=_d\ \in X\cap S\cap T\) whose semantic is the pair of nodes with same data value. Therefore, we have:

Theorem 4

Let \(\psi (u)\) be a formula in \(\text {FO}^{}(\sigma _{\mathcal R})\) with one free variable u. The following statements are equivalent:

  1. 1.

    For all \((\mathcal {T},x)\) and \((\mathcal {T}',x')\) with \(\mathcal {T},\mathcal {T}' \in \mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\), if \(\mathcal {T}\,\models \, \psi [u\mapsto x]\) and \((\mathcal {T},x)\leftrightharpoons ^{{\mathcal R}}(\mathcal {T}',x')\) then \(\mathcal {T}'\,\models \,\psi [u\mapsto x']\)

  2. 2.

    There exists a node expression \(\varphi \in \text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) such that for all \((\mathcal {T},x)\) with \(\mathcal {T}\in \mathcal {U}^{{\mathcal R}}_{\mathbb {A}, X,S,T}\) it happens that \((\mathcal {T},x)\,\models \,\psi (u)\) iff \((\mathcal {T},x)\,\models \,\text {ST}\left( \varphi \right) (u)\).

5 Conclusions and Future Work

We have provided (Definition 12) \({\mathcal R}\)-bisimulation notions for generalizations of the logic \(\text {XPath} _{=} (\downarrow )\) over the full universe of \({\mathcal R}\text {-trees}\). This notion coincides with that of [4] for an adequate \({\mathcal R}\) (Remark 7). We have shown that \({\mathcal R}\)-bisimulation coincides with \({\mathcal R}\)-logical equivalence over finitely branching pointed trees (Theorem 2). We have also characterized \(\text {XPath}^{}_{{\mathcal R}}(\downarrow ){}\) as the fragment of first-order logic that is invariant by \({\mathcal R}\)-bisimulations (Theorem 3). Finally, we have shown (Theorem 4) that our results also apply for universes of trees with some restricted semantics which contain the case of \(\text {XPath} _{=} (\downarrow )\) over data trees.

In the future, we would like to further generalize the work done in this paper, for example allowing other navigational modalities (such as sibling operators) for trees.