1 Introduction

Context unification is a generalization of word equations to terms. In the word equations problem we are given an alphabet \(\varSigma \) and a set of variables \(\mathcal X\). Then, given an equation of the form \(u = v\), where both uv are words over the letters and variables, we ask about the existence of a substitution of variables by words over the alphabet that turns this formal equation into a true equality (of words over the alphabet). The first algorithm for this problem was given by Makanin [18]; the currently best algorithms for this problem utilize different (and simpler) approach and work in PSPACE [11, 22], the best known lower bound is NP, and it follows easily from, say, the NP-hardness of integer programming.

We view terms as rooted, ordered (meaning that the children of a node are ordered using a usual left-to-right order) trees, usually denoted with letters t or s. Nodes are labelled with elements from a ranked alphabet \(\varSigma \), i.e. each letter \(a \in \varSigma \) has a fixed arity ; those elements are usually called letters. A tree is well-formed if a node labelled with f has exactly children; we consider only well-formed trees, which can be equivalently seen as ground terms over \(\varSigma \). We will also use term notation to denote the trees in text, for example, \(f(c,c')\) denotes the tree with root labelled with f and two children, first (left) labelled with c and the second (right) with \(c'\); those children are leaves.

When generalizing the word equations from words to terms, one first needs to decide, what a variable can represent. If a variable can only represent a well-formed term, then we arrive at a standard first-order unification problem, which can be solved in linear time; so this does not even generalize the word equations. Thus the variables are allowed to take arguments, i.e. they define trees with missing subtrees. Formally, we extend the alphabet with parameter symbols \({\bullet }_1, {\bullet }_2,\ldots \) of arity 0. If a term t uses \({\bullet }_1, {\bullet }_2,\ldots ,{\bullet }_i\) then \(t(t_1,\ldots ,t_i)\), where \(t_1,\ldots ,t_i\) do not use parameters, is t in which \({\bullet }_j\) is replaced with \(t_j\). Thus the variables are ranked: X takes arguments and the substitution for it has to use \({\bullet }_1, {\bullet }_2,\ldots ,{\bullet }_{ar(X)}\). For instance, an equation \(f(X(c),X(c)) = X(f(c,c))\) has a solution \(X = {\bullet }\). Under this substitution both sides evaluate to f(cc). There are other solutions, for instance \(X = f({\bullet },{\bullet })\), which evaluates both sides to f(f(cc), f(cc)); in general, solution that evaluates both sides to full binary tree of arbitrary height is easy to construct. When no further restrictions are given, this problem is the second order unification and is undecidable [9].

In context unification we require that each \({\bullet }_j\) is used exactly once. For instance, the aforementioned equation \(f(X(c),X(c)) = X(f(c,c))\) as an instance of context unification has exactly one solution: \(X = {\bullet }\), other solution are excluded by the restriction that \({\bullet }\) is used exactly once. It is easy to see that the case of many argument NP-reduces to the case with only one argument, and we deal with only this case further on. Context unification was introduced by Comon [1, 2], who also coined the name, and independently by Schmidt-Schauß [24]. It found usage in analysis of rewrite systems with membership constraints [1, 2], analysis of natural language [20, 21], distributive unification [25] and bi-rewriting systems [13].

Context unification is both subsumed by second order unification (which is undecidable) and subsumes word equations (which are decidable). Furthermore, other natural problems between those two usually trivially reduce to word equations or are undecidable. Thus, in a sense, context unification is the only such problem, whose decidability remains open. This is one of the reasons why it gained considerable attention in the term rewriting community [23] and no wonder that there was a large body of work focused on context unification and several partial results were obtained [2, 6,7,8, 12, 15, 17, 26, 28, 29]. Note that in most cases the corresponding variants of the general second order unification remain undecidable, which supported the conjecture that context unification is decidable.

Context unification was shown to be equivalent to ‘equality up to constraint’ problem [20] (which is a common generalisation of equality constraints, subtree constraints and one-step rewriting constraints). In fact one-step rewriting constraints, which is a problem extensively studied on its own, are equivalent to stratified context unification [21]. It is known that the first-order theory of one-step rewriting constraints is undecidable [19, 32, 33]. For whole context unification, already the \(\forall \) \(\exists ^8\)-equational theory is \(\varPi _1^0\)-hard [34].

Some fragments of second order unification are known to reduce to context unification: the bounded second order unification assumes that the number of occurrences of the argument of the second-order variable in the substitution term is bounded by a constant; note that it can be zero and this is the crucial difference with context unification; cf. monadic second order unification, which can be seen as a similar variant of word equations, and is known to be NP-complete [14]. This fragment on one hand easily reduces to context unification and on the other hand it is known to be decidable [27]; in fact its generalisation to higher-order unification is decidable as well [30] and it is known that bounded second order unification is NP-complete [15].

The context unification can be also extended by allowing some additional constraints on variables, a natural one allows the usage of the tree-regular constraints, i.e. for any variable we require that its substitution comes from a certain regular set of trees. It is known that such an extension is equivalent to the linear second order unification [16], defined by Levy [12]: in essence, the linear second order unification allows bounding variables on different levels of the function, which makes direct translations to context unification infeasible, however, usage of regular constraints gives enough expressive power to encode such more complex bounding. Note, that the reductions are not polynomial and the equivalence is stated only on the decidability level.

The usage of regular constraints is very popular in case of word equations, in particular it is used in generalisations of the algorithm for word equation to the group case and essentially all known algorithms for word equations problem can be generalised to word equations with regular constraints [3, 4, 31].

Results

The decidability status of context unification remained unknown for almost two decades. In this paper I present a proof that context unification can be solved in PSPACE, using a generalization of an algorithm for word equations; see [10] for a full version.

The idea is to apply simple compression rules (replacing pairs of neighbouring function symbols) to the solution of the context equation; to this end we appropriately modify the equation (without the knowledge of the actual solution) so that compressing the solution can be simulated by compressing parts of the equation. It is shown that if the compression operations are appropriately chosen, then the size of the instance is polynomial during the whole algorithm, thus giving a PSPACE-upper bound. The best known lower bounds are the same for word equations, i.e. context unification is NP-hard. The method can be extended to the scenario in which tree-regular for the variables are present, in which case the problem is EXPTIME-complete.

This idea, known under the name of recompression, was used before for word equations [11], simplifying the existing proof of containment in PSPACE. Furthermore, applications of compression to fragments of context unification were known before [6, 8, 15, 30] and the presented algorithm extends this method to terms in full generality. In this way solving word equations using recompression [11] generalises to solving context unification. This in some sense fulfils the original plan of extending the algorithms for word equations to context unification.

2 Definitions

2.1 Trees

As said before, we are given a ranked alphabet \(\varSigma \), i.e. there is an arity function , and we deal with rooted (there is a designated root), ordered (there is a fixed linear order on children of each node) \(\varSigma \)-labelled trees. We say that a tree is well-formed when a node labelled with \(a \in \varSigma \) has children. We also view such trees as terms, then a tree is well-built if seen as a term it is well-built.

2.2 Patterns

We want to replace fragments of a tree with new nodes, those fragments are not necessarily well-formed. Thus we want to define ‘trees with holes’, where holes represent missing arguments. Let \(\mathbb Y = \{{\bullet }, {\bullet }_1, {\bullet }_2, \ldots \}\) be an infinite set of symbols of arity 0, we think of each of them as a place of a missing argument; its elements are collectively called parameters. A pattern is a tree over an alphabet \(\varSigma \cup \mathbb Y\), such that each parameter occurs at most once in it. The usual convention is that the used parameters are \({\bullet }_1, {\bullet }_2, \ldots ,{\bullet }_k \), or \({\bullet }\), when there is only 1 parameter; moreover, we usually assume that the order (according to preorder traversal of the pattern) of occurrences of the parameters in the pattern is \({\bullet }_1, {\bullet }_2, \ldots ,{\bullet }_k \). We often refer to parameter nodes and non-parameter nodes to refer to nodes labelled with parameters and non-parameters, respectively. A pattern using r parameters is called r-pattern. A pattern p occurs (at a node v) in a tree t if p can be obtained by taking a subtree \(t'\) of t rooted at v and replacing some of subtrees of \(t'\) by appropriate parameters. This is also called an occurrence of p in t. A pattern p is a subpattern of t if p occurs in t.

In a more classic terminology, 1-patterns are also called contexts, hence the name “context unification”.

Given a tree t, its r-subpattern p occurrence and an r-pattern \(p'\) we can naturally replace p with \(p'\): we delete the part of t corresponding to p with removed parameters and plug \(p'\) with removed parameters instead and reattach all the subtrees in the same order; as the number of parameters is the same, this is well-defined. We can perform several replacements at the same time, as long as occurrences of replaced patterns do not share non-parameter nodes. In this terminology, our algorithm will replace occurrences of subpatterns of t in t.

We focus on some specific patterns: A chain is a pattern that consists only of unary nodes plus one parameter node. Chains that have two nodes that are labelled with different letters, i.e. of the form \(a (b ({\bullet }) )\) for \(a \ne b\), are called pairs; chains whose all unary nodes are labelled with the same letter a, i.e. of the form \(a (a ( \ldots (a({\bullet }) \ldots )))\), are called a-chains. A chain \(t'\) that is a subpattern of t is a chain subpattern of t, an occurrence of an a-chain subpattern \(a (a ( \ldots (a({\bullet }) \ldots )))\) is maximal if it cannot be extended (in t) by a nor up nor down. A pattern of a form is denoted as (fic).

We treat chains as strings and write them in the string notation (in particular, we drop the parameters) and ‘concatenate’ them, that is, for two chains \(s_a = a_1(a_2(\ldots a_k({\bullet })\ldots ))\) and \(s_b = b_1(b_2(\ldots b_\ell ({\bullet })\ldots ))\) we write them as \(s_a = a_1 a_2 \cdots a_k\) and \(s_b = b_1b_2\cdots b_\ell \) and their concatenation \(s_a s_b = a_1 a_2 \cdots a_k b_1 b_2 \cdots b_\ell \) denotes the chain \(a_1(a_2(\ldots a_k(b_1(b_2(\ldots b_\ell ({\bullet })\ldots )) )\ldots ))\). In this convention ab denotes a pair and \(a^\ell \) denotes an a-chain. We use those conventions also for 1-patterns and also for 1-patterns followed by a single term, i.e. for 1-patterns \(p_{1},\ldots , p_{k}\) and a term t the \(p_1p_2\cdots p_kt\) denotes the term \(p_1(p_2(\ldots p_k(t)\ldots ))\).

2.3 Context Unification: Formal Statement

By \(\mathcal V \) we denote an infinite set of context variables X, Y, Z, .... We also use individual term variables x, y, z, ... taken from \(\mathcal X \). When we do not want to distinguish between a context variable or term variable, we call it a variable and denote it by a small Greek letter, like \(\alpha , \beta , \gamma , \ldots \).

Definition 1

The terms over \(\varSigma \), \(\mathcal X \), \(\mathcal V \) are ground terms with alphabet \(\varSigma \cup \mathcal X \cup \mathcal V \) in which we extend the arity function to \(\mathcal X \cup \mathcal V \) by and for each \(X \in \mathcal V \) and \(x \in \mathcal X \). A context equation is an equation of the form \(u = v\) where both u and v are terms over \(\varSigma \cup \mathcal X \cup \mathcal V \).

We are interested in the solutions of the context equations, i.e. substitutions that replace term variables with ground terms and context variables with 1-patterns, such that a formal equality \(u = v\) is turned into a valid equality of ground terms. More formally:

Definition 2

A substitution is a mapping \(S\) that assigns a 1-pattern \(S(X)\) to each context variable \(X \in \mathcal V \) and a ground term \(S(x)\) to each variable \(x \in \mathcal X \). The mapping \(S\) is naturally extended to arbitrary terms as follows:

  • \(S(a) := a\) for each constant \(a \in \varSigma \);

  • \(S(f(t_1, \ldots , t_n)) := f(S(t_1), \ldots , S(t_m))\) for an m-ary \(f \in \varSigma \);

  • \(S(X(t)) := S(X) (S(t))\) for \(X \in \mathcal V \).

A substitution \(S\) is a solution of the context equation \(u = v\) if \(S(u) = S(v) \). The size of a solution \(S\) of an equation \(u = v\) is \(|S(u) |\), which is simply the total number of nodes in \(S(u)\). A solution is size-minimal, if for every other solution \(S '\) it holds that \(|S(u) | \le |S '(u)|\). A solution \(S\) is non-empty if \(S(X) \) is not a parameter for each \(X \in \mathcal X \) from the context equation \(u = v\).

For a ground term \(S(u)\) and an occurrence of a letter a in it we say that this occurrence comes from u if it was obtained as \(S(a)\) in Definition 2 and that it comes from \(\alpha \) if it was obtained from \(S(\alpha )\) in Definition 2.

Example 1

Consider an alphabet \(\varSigma = \{f, c, c'\}\) with and and an equation \(X(c) = Y(c')\) over it. It has a solution (which is easily seen to be size-minimal) \(S(X) = f({\bullet },c')\) and \(S(Y) = f(c,{\bullet })\); under this substitution this equation evaluates to \(S(X(c)) = S(Y(c')) = f(c,c')\).

3 Local Compression of Trees

3.1 Compression Operations

We perform three types of subpattern compression on a tree t:

  • a-chain compression. For a unary letter a and every \(\ell > 1\) we replace each occurrence of a maximal a-chain subpattern \(a^\ell \) by a new unary letter \(a_\ell \).

  • ab compression. For two different unary letters a and b we replace each occurrence of a subpattern ab with a new unary letter c.

  • (fic) compression. For a constant c and letter f of arity , we replace each occurrence of subpattern (fic), i.e. \(f({\bullet }_1,{\bullet }_2,\ldots ,\) \({\bullet }_{i-1},c,{\bullet }_{i},\ldots ,{\bullet }_{m-1})\), with subpattern \(f'({\bullet }_1,{\bullet }_2,\ldots ,{\bullet }_{i-1}, {\bullet }_{i},\ldots ,{\bullet }_{m-1})\), where \(f'\) is a fresh letter of arity \(m-1\) (intuitively: the constant c on i-th place is ‘absorbed’ by its father labelled with f).

These operations are collectively called subpattern compressions. When we want to specify the type but not the actual subpattern compressed, we use the names chain compression, pair compression and leaf compression. These operations are also called , and , or simply for a pattern \(p \in \{a,ab,(f, i, c)\}\).

The a-chain compression and ab compression are direct translations of the operations used in the recompression-based algorithm for word equations [11]. On the other hand, the leaf compression is a new operation that is designed specifically to deal with trees.

3.2 Compression of Non-crossing Patterns

Consider a context equation \(u = v\) and its solution \(S\). Suppose that we want to perform the ab compression on \(S(u)\) and \(S(v)\), i.e. we want to replace each occurrence of a subpattern ab with a fresh unary letter c. Such replacement is easy, when the occurrence of ab subpattern comes from the letters in the equation or from \(S(\alpha )\) for some variable \(\alpha \): in the former case we modify the equation by replacing the occurrences of ab with c, in the latter the modification is done implicitly (i.e. we replace the occurrences of fab in \(S(\alpha )\) with c). The problematic part is with the ab subpattern that is of neither of those forms, as they ‘cross’ between \(S(\alpha )\) and some letter outside \(S(\alpha )\). This is formalised in the below definition, the intuition and definition is similar also for a-chains and (fic) patterns.

Definition 3

For an equation \(u = v\) and a substitution \(S\) we say that an occurrence of a subpattern p in \(S(u)\) (or \(S(v)\)) is

  • explicit with respect to \(S\): if all non-parameter letters in this occurrence come from explicit letters in \(u = v\);

  • implicit with respect to \(S\): if all non-parameter letters in this occurrence come from \(S(\alpha )\) for a single occurrence of a variable \(\alpha \);

  • crossing with respect to \(S\): otherwise.

We say that ab is a crossing pair (a has a crossing chain; (fic) is a crossing father-leaf subpattern) with respect to \(S\) if it has at least one crossing occurrence (there is a crossing occurrence of an \(a^\ell \) chain; has at least one crossing occurrence) with respect to \(S\). Otherwise ab (a, (fic)) is a non-crossing pair (has no crossing chain; is a non-crossing father-leaf subpattern) with respect to \(S\).

To make the notions more uniform, we will also say that \(p \in \{ab, a, (f,i,c)\}\) is a crossing/non-crossing subpattern, meaning that ab is a crossing/noncrossing pair, a has crossing chain/has no crossing chains and (fic) is a crossing/non-crossing father-leaf subpattern.

When ab is non-crossing with respect to a solution \(S\), we can simulate the on \(u_{} = v_{}\) simply by performing the ab compression on the explicit letters in the equation, we refer to this operation as . Then occurrences of ab that come from explicit letters are compressed, the ones that come from \(S(\alpha )\) are compressed by changing the solution and there are no other possibilities. The same applies also to compression of a-chains and (fic)-compression.

As we discuss correctness of nondeterministic procedures, in the following we will say that a nondeterministic procedure is sound, if given a non-satisfiable context equation it cannot transform it to a satisfiable equation, regardless of the nondeterministic choices.

Lemma 1

is sound.

Let \(u = v\) has a solution \(S\) and let \(p \in \{ab, a, (f,i,c)\}\) be a noncrossing subpattern. Then the returned equation \(u' = v'\) has a solution \(S '\) (over an alphabet expanded by letters introduced during the subpattern compression) such that .

Fig. 1.
figure 1

An illustration to Example 2. The Figure presents the tree obtained at both sides under the substitution, the values substitutions for variables are depicted: in dashed line for X and dotted for Y. On the right the tree after the compression of bb is depicted.

Example 2

Consider the following context equation over the alphabet \(\{a,b,c,c',f\}\), with , and :

$$ bXaXc = baa(f(aYc',bb(f(c,c')))) , $$

see also Fig. 1. It is easy to see that there is a unique solution \(S(X) = a a(f({\bullet },bb(f(c,c'))))\) and \(S(Y) = aa(f(c,bb(f(c,{\bullet }))))\).

The subpattern ba has a crossing occurrence on the left-hand side, as a is the first letter of \(S(X) \) and bX is a subpattern. Subpattern bb has only noncrossing occurrences, some of them explicit and some implicit. Subpatterns \((f,2,c'), (f,1,c), a^3\) are also crossing.

Compressing b subpattern leads to an equation

$$ b X a X c = baa(f(ba Y c',b'(f(c,c')))) . $$

Then the solution is \(S '(X) = aa(f(b{\bullet },b'(f(c,c'))))\) and \(S '(Y) = aa(f(bc,b'(f(c,{\bullet }))))\).

4 Uncrossing

In general, one cannot assume that an arbitrary pair ab, a-chain or (fic) subpattern is non-crossing. However, for a fixed subpattern p and a solution \(S\) we can modify the instance so that this p becomes non-crossing with respect to a solution \(S '\) (that corresponds to \(S\) of the original equation). This modification is the cornerstone of our main algorithm, as it allows subpattern compression to be performed directly on the equation, regardless of how the solution actually looks like.

4.1 Uncrossing a Pair

We begin with showing how to turn a crossing pair ab into a non-crossing one. As a first step, we characterise crossing pairs in a more operational manner: for a non-empty substitution \(S\), a variable \(\alpha \) and a context variable X by a first letter of \(S(\alpha )\) we denote the topmost-letter in \(S(\alpha )\), by the last letter of \(S(X)\) we denote the function symbol that is the father of ‘\({\bullet }\)’ in \(S(X)\). Then ab is crossing with respect to \(S\) if and only if one of the following conditions holds for some variable \(\alpha \) and context variable X:

  1. (CP1)

    \(a\alpha \) occurs in \(u = v\) and b is the first letter of \(S(\alpha ) \) or

  2. (CP2)

    Xb occurs in \(u = v\) and a is the last letter of \(S(X) \) or

  3. (CP3)

    \(X \alpha \) occurs in \(u = v\), a is the last letter of \(S(X) \) and b the first letter of \(S(\alpha )\).

In each of (CP1CP3) it is easy to modify the instance so that ab is no longer a crossing pair:

  1. Ad (CP1):

    We pop up the letter b: we replace \(\alpha \) with \(b\alpha \). In this way we also modify the solution \(S(\alpha )\) from bt to t. If the new substitution for \(\alpha \) is empty (which can happen only when \(\alpha \) is a context variable), we remove \(\alpha \) from the equation.

  2. Ad (CP2):

    We pop down the letter a: we replace each occurrence of X with Xa. In this way we implicitly modify the solution \(S(X) \) from sa to s. If the new substitution for X is empty, we remove X from the equation.

  3. Ad (CP3):

    this is a combination of the two cases above, in which we need to pop-down from X and pop-up from \(\alpha \).

The whole uncrossing procedure can be even simplified: for each context variable X we guess its last letter and it if is a, then we pop it down. Similarly, for each variable \(\alpha \) we guess its first letter and if it is b then we pop it up. If at any point a context some variable becomes empty then we remove its occurrences. We call the whole procedure and its details are in the pseudocode in Algorithm 1.

figure c

If \(u = v\) has a solution \(S\) then for appropriate non-deterministic choices returns an equation \(u' = v'\) that has a solution \(S '\) such that ab is non-crossing with respect to \(S '\) and \(S '(u') = S(u) \).

Lemma 2

Let \(a \ne b\) be two different unary letters. Then is sound and if \(u = v\) has a non-empty solution \(S\) (over an alphabet \(\varSigma \)) then for appropriate non-deterministic choices the returned equation \(u' = v'\) has a non-empty solution \(S '\) (over the same alphabet \(\varSigma \)) such that \(S '(u') = S(u) \) and ab is a non-crossing pair with respect to \(S '\).

Example 3

Continuing Example 2, recall that ba is a crossing subpattern in

$$ b X a X c = baa(f(a Y c',bb(f(c,c')))) , $$

where the alphabet is \(\{a,b,c,c',f\}\), with , and and \(S(X) = a (a(f({\bullet },b(b(f(c,c'))))))\) and \(S(Y) = a(a(f(c ,b(b(f(c,{\bullet }))))))\). We pop a up from X, obtaining

$$ ba X b a a X b c = baa(f(a Y c',bb(f(c,c')))) , $$

the solution is \(S '(X) = a(f({\bullet },bb(f(c,c'))))\) and \(S '(Y) = S(Y) \). It is easy to verify that ba is no longer crossing.

4.2 Uncrossing Chains

Suppose that some unary letter a has a crossing chain with respect to a non-empty solution \(S\). As in the case of pairs, it is easy to see that a has a crossing chain with respect to \(S\) if and only if one of the following holds for variable \(\alpha \) and context variable X (note that those conditions are in fact (CP1CP3) for \(a = b\)):

  1. (CC1)

    \(a\alpha \) occurs in \(u = v\) and the first letter of \(S(\alpha )\) is a;

  2. (CC2)

    Xa occurs in \(u = v\) and a is the last letter of \(S(X) \);

  3. (CC3)

    \(X\alpha \) occurs in \(u = v\) and a is the last letter of \(S(X) \) as well as the first letter of \(S(\alpha )\).

The cases (CC1) and (CC2) are symmetric while (CC3) is a composition of (CC1) and (CC2). So suppose that (CC2) holds. Then we can replace each occurrence of X in the equation \(u = v\) with Xa (implicitly changing the solution \(S(X) = ta\) to \(S(X) = t\)), but it can still happen that a is the last letter of \(S(X) \). So we keep popping down a until the last letter of \(S(X)\) is not a, in other words we replace X with \(Xa^r\), where \(S(X) = ta^r\) and the last letter of t is not a. Then a and X can no longer satisfy condition (CC2), as the last letter of \(S '(X)\) is different than a. A symmetric action and analysis apply to (CC1), and (CC3) follows by applying the popping down for X and popping up for \(\alpha \). To simplify the arguments, for a ground term or 1-pattern t we say that \(a^\ell \) is the a-prefix of t if \(t = a^\ell t'\) and the first letter of \(t'\) is not a (\(t'\) may be empty). Similarly, for a 1-pattern t we say that \(a^r\) is the a-suffix of t if \(t = t' a^r\) and the last letter of \(t'\) is not a (in particular, \(t'\) may be empty).

We call this procedure , its formal details are given in Algorithm 2.

figure g

Lemma 3

is sound and if \(u = v\) has a non-empty solution \(S\) (over an alphabet \(\varSigma \)) then for appropriate non-deterministic choices the returned equation \(u' = v'\) has a non-empty solution \(S '\) (over the same alphabet \(\varSigma \)) such that \(S '(u') = S(u) \) and a has no crossing chains with respect to \(S '\).

Example 4

Continuing Example 2:

$$ b X a X c = baa(f(a Y c',bb(f(c,c')))) , $$

where the alphabet is \(\{a,b,c,c',f\}\), with , and and \(S(X) = aa(f({\bullet },bb(f(c,c'))))\) and \(S(Y) = aa(f(c,bb(f(c,{\bullet }))))\).

There are crossing a chains, because aY is a subpattern and a is the first letter of \(S(Y) \). We pop the a-prefixes from XY, there are no a-suffixes. The instance is now

$$ b a a X a a a X c = b a a(f(a a a Y c',bb(f(c,c')))) , $$

where \(S '(X) = f({\bullet },bb(f(c,c')))\) and \(S '(Y) = f(c ,bb(f({\bullet },c)))\). It is easy to verify that a has no crossing chains.

4.3 Uncrossing Father-Leaf Subpattern

We now show how to uncross a father-leaf subpattern (fic). It is easy to observe that father-leaf subpattern (fic) is crossing (with respect to a non-empty \(S\)) if and only if one of the following holds for some context variable X and term variable y:

  1. (CFL 1)

    f with an i-th son y occurs in \(u = v\) and \(S(y) = c\);

  2. (CFL 2)

    Xc occurs in \(u = v\) and the last letter of \(S(X)\) is f and \({\bullet }\) is its i-th child;

  3. (CFL 3)

    Xy occurs in \(u = v\), \(S(y) = c\) and f is the last letter of \(S(X)\) and \({\bullet }\) is its i-th child.

We want to ‘pop-up’ c and ‘pop-down’ f. Popping up c is easy (we replace y with c); popping-down f is more complex. Let us first present the intuition:

  • In (CFL1) we pop up the letter c from y, which in this case means that we replace each occurrence of y in the equation with \(c = S(y) \). Since y is no longer in the context equation, we can restrict the solution so that it does not assign any value to y.

  • In (CFL2) we pop down the letter f: let \(S(X) = sf(t_1,\ldots ,t_{i-1},{\bullet },t_{i},\ldots ,\)\(t_{m-1})\), where s is a 1-pattern and each \(t_i\) is a ground term and . Then we replace each X with \(Xf(x_1,x_2,\ldots ,x_{i-1},{\bullet },x_{i},\ldots ,x_{m-1})\), where \(x_1,\ldots ,x_{m-1}\) are fresh term variables. In this way we implicitly modify the solution \(S(X) = s(f(t_1,t_2,\ldots ,t_{i-1},{\bullet },t_{i},\ldots ,t_{m-1}))\) to \(S '(X) = s\) and add \(S '(x_j) = t_j\) for \(j = 1,\ldots ,m-1\). If \(S '(X)\) is empty, we remove X from the equation.

  • The third case (CFL3) is a combination of (CFL1)–(CFL2), in which we need to pop-down from X and pop up from y.

We call this procedure , its formal description is given in Algorithm 3.

figure j

Lemma 4

Let and , then is sound and if \(u = v\) has a non-empty solution \(S\) (over an alphabet \(\varSigma \)) then for appropriate non-deterministic choices the returned equation \(u' = v'\) has a non-empty solution \(S '\) (over the same alphabet \(\varSigma \)) such that \(S '(u') = S(u) \) and there is no crossing father-leaf subpattern (fic) with respect to \(S '\).

Example 5

Continuing Example 2, the equations

$$ b X a X c = baa(f( a Y c',bb(f(c,c')))) , $$

where the alphabet is \(\{a,b,c,c',f\}\), with , and , has a solution \(S(X) = a a(f({\bullet },bb(f(c,c'))))\) and \(S(Y) = aa(f(c,bb(f(c,{\bullet }))))\). A subpattern \((f,2,c')\) is crossing, as \(Yc'\) is a subpattern, the last letter of Y is f and the hole \({\bullet }\) is the second child of this f We uncross it by popping down from Y: The instance is now

$$ b X a X c = b a a(f(a Y (f(y,c')),bb(f(c,c')))) $$

where \(S '(X) = S(X) \), \(S '(Y) = f(c,b(b({\bullet })))\) and \(S(y) = c\). It is easy to see that \((f,2,c')\) is now noncrossing.

5 The Algorithm

In its main part, iterates the following operation: it nondeterministically chooses to perform one of the compressions: ab compression, a-chain compression or (fic) compression, where abcf are some letters of appropriate arity. It then nondeterministically choose, whether this pattern is crossing or not. If so, it performs the appropriate uncrossing. Then it performs the subpattern compression for p and adds the new letter (or letters, for chains compression) to \(\varSigma \). We call one iteration of main loop of a phase.

figure n

The extended algorithm works in the same way, except that at the beginning of each iteration it removes from the alphabet the letters that are neither from the original alphabet neither are present in the current context equation. It is easy to show that such removal does not change the satisfiability of the given equation.

Theorem 1

and store an equation of length \(\mathcal {O}(n^2k^2)\), where n is the size of the input equation and k the maximal arity of symbols from the input alphabet. They non-deterministically solve context equations, in the sense that:

  • if the input equation is not-satisfiable then they never return ‘YES’;

  • if the input equation is satisfiable then for some nondeterministic choices in \(\mathcal {O}(n^3k^3 \log N)\) phases they return ‘YES’, where N is the size of size-minimal solution.

As a corollary we get an upper bound on the computational complexity of context unification.

Theorem 2

Context unification is in PSPACE.

6 Space Bounds

While the soundness of the algorithm follows from soundness of its subprocedures, the space bounds, and so the termination, remains to be shown.

6.1 General Bounds

First, we recall that the following known bound on the size of the a-chains for size-minimal solutions. This ensures that we can compress the chains in polynomial space.

Lemma 5

(Exponent of periodicity bound [28]). Let \(S\) be a size-minimal solution of a context equation \(u = v\) (for an alphabet \(\varSigma \)). Suppose that \(S(X)\) (or \(S(x)\)) can be written as \(t a^m t'\), where \(t, t'\) are 1-patterns (or \(t'\) is a ground term, respectively) and a is a unary letter. Then \(m = 2^{\mathcal {O}(|u|+|v|)}\).

Now we bound the number of variables occurrences during the algorithm. Note that this bound works for all nondeterministic choices.

Lemma 6

The number of occurrences of context variables during is at most n. The number of occurrences of term variables is at most nk.

The bound for context variables is obvious, as we never introduce new ones. For term variables observe that during the (fic) uncrossing we introduce new term variables, by popping them from context variables. However, it can be shown that when we pop new term variables from X, all term variables previously introduced by X have been removed. This yields the bound.

As a next step, we estimate the number of different crossing subpatterns. This follows by a simple argument that such a pattern can be associated with a top or bottom letter in a variable.

Lemma 7

For an equation \(u = v\) during and its solution \(S\) the number of different crossing subpatterns of the form aab, (fic) is at most \(n(k+1)\).

We can also limit the number of new letters introduced during the uncrossing. Again, this follows a simple calculation.

Lemma 8

Uncrossing and compression of a subpattern introduces at most \(n(k+1)\) letters to the equation.

6.2 Strategy

The strategy of choosing nondeterministic choices is easy: if there is a noncrossing pattern, then we compress it, as this decreases both the size of the equation and of the minimal solution.

If there is none, then we choose a pattern, whose compression makes equation smallest possible (after this one compression). As there are only \(n(k+1)\) such candidates, see Lemma 7, one of them will appear roughly \((|u|+|v|)/n(k+1)\) many times. Its compression removes \((|u|+|v|)/2n(k+1)\) letters and introduces at most \(n(k+1)\) many letters, see Lemma 8. This shows that we never exceed the quadratic bound on \(|u| + |v|\) given in Theorem 1.

If we additionally make choices so as to minimize the size of the solution, then we can guarantee to terminate after the number of steps depending on \(\log N\) (and not N), so as claimed in Theorem 1.

7 Detailed Example

We now run the algorithm on Example 2, see also the Fig. 2. Recall the equation:

$$ b X a X c = b a (a(f(a Y c',b(b(f(c,c')))))) , $$

where the alphabet is \(\varSigma = \{a,b,c,c',f\}\), with , and and \(S(X) = a (a(f({\bullet },b(b(f(c,c'))))))\) and \(S(Y) = a(a(f(c ,b(b(f(c,{\bullet }))))))\). There is no need to preprocess the alphabet.

Fig. 2.
figure 2

An illustration of running the algorithm on the instance from Example 2. The Figure presents the term obtained at both sides under the substitution, the values substitutions for variables are depicted: in dashed line for X and dotted for Y, later also dash-dotted for x. The changes are according to the consecutive actions described in the section: b chains compression, uncrossing (f, 1, c), compression of (f, 1, c), compression of \(b'f'\), uncrossing and compression of \((b'',1,c')\), uncrossing a-chains and compression of a chains, compression of \(ba_2\), uncrossing and compression of \((f,2,c'')\), compression of \(b''f''\), compression of \(b'''a_3\), uncrossing and compression of \(b''''f'\) and finally compression of \((f'',1,c'')\).

As b has no crossing chains, so we compress it, obtaining

$$ b X a X c = b a a(f(a Y c',b'(f(c,c')))) , $$

where the alphabet is \(\varSigma \cup \{b'\}\), with and the solution and \(S(X) = a a(f({\bullet },b'(f(c,c'))))\) and \(S(Y) = aa(f(c ,b'(f(c,{\bullet }))))\). Now every potential subpattern for compression is crossing. We choose (f, 1, c) for compression and uncross it. To this end we pop f down from X. Note, that according to the algorithm, f is not popped down from Y, even though f is its last letter, it does not take part in any crossing occurrence of a subpattern (f, 1, c). The instance is now

$$ b X (f(a X f(c, x),x)) = b a a (f(a Y c',b'(f(c,c')))) , $$

with a solution \(S(X) = aa\), \(S(Y) = aa(f(c ,b'(f(c,{\bullet }))))\) and \(S(x) = b'(f(c,c'))\). We compress (f, 1, c), obtaining

$$ b X (f(a X f'x,x)) = b a a (f(a Y c',b'f'c')) , $$

with a solution \(S(X) = aa\), \(S(Y) = aaf'b'f'\), \(S(x) = b'f'c'\), here \(f'\) is of arity is added to the alphabet, which is now \(\varSigma \cup \{b',f'\}\). Note that there is a new term variable x. Now \(b'f'\) is noncrossing, so we compress it (into \(b''\) of arity 1), obtaining

$$ b X (f(a X f'x,x)) = b a a (f(a Y c',b''c')) , $$

with a solution \(S(X) = aa\), \(S(Y) = aaf'b''\) and \(S(x) = b''c'\). The \(b'\) is no longer used, so it is removed from the alphabet, which is now \(\varSigma \cup \{b'', f'\}\) We choose to compress \(b'' c'\), as a \((b,1,c')\) subpattern, we first uncross it by popping f down from Y and then we compress into a new constant \(c''\), the \(b''\) can be removed. So the alphabet is now \(\varSigma \cup \{f', c''\}\) and the equation is

$$ b X (f(a X f'x,x)) = b a a (f(a Y c'',c'')) , $$

with a solution \(S(X) = aa\), \(S(Y) = aaf'\) and \(S(x) = c''\).

Now we uncross and compress a-blocks: in uncrossing we replace X with \(a^2\) and pop-up \(a^2\) from Y. Afterwards we replace \(a^2\) with \(a_2\) and \(a^3\) with \(a_3\). The alphabet is now \(\varSigma \cup \{f', c'', a_2, a_3\}\) and the equation is

$$ b a_2 (f(a_3 f'x,x)) = b a_2 (f(a_3 Y c'',c'')) , $$

with a solution \(S(Y) = f'\) and \(S(x) = c''\).

Now we compress the noncrossing \(ba_2\) into \(b''\) (note that we reuse the letter \(b''\)), \(a_2\) is removed from the alphabet, which is now \(\varSigma \cup \{b'', f', c'', a_3\}\) and the equation is

$$ b'' (f(a_3 f'x,x)) = b'' (f(a_3 Y c'',c'')) , $$

with a solution \(S(Y) = f'\) and \(S(x) = c''\). Now we choose to compress \((f,2,c'')\), so we pop up \(c''\) from x (thus removing it). After the compression, the alphabet is \(\varSigma \cup \{b'', f', f'', c'', a_3\}\) and the equation is \( b'' f''a_3 f'c'' = b'' f''a_3 Y c''\), with a solution \(S(Y) = f'\).

We compress noncrossing \(b''f''\), obtaining the alphabet \(\varSigma \cup \{b''', f', c'', a_3\}\) and the equation is \( b'''a_3 f'c'' = b''' a_3 Y c''\), with a solution \(S(Y) = f'\). We now compress \(b''a_3\), which is noncrossing, yielding an equation \(b''''f'c'' = b''''Y c''\), with a solution \(S(Y) = f'\) over an alphabet \(\varSigma \cup \{b'''', f', c''\}\).

Now we uncross \(b''''f'\), by replacing Y with \(f'\) and compress it, obtaining a trivial equation \(f'' c'' = f'' c''\) over an alphabet \(\varSigma \cup \{f'', c''\}\), for which we perform the final compression \((f'',1,c'')\), yielding an equation \(c''' = c'''\).

8 Regular Constraints

We now consider the problem of context unification with regular constraints. In this setting, the input contains (some description of) regular tree languages \({L_{1},\ldots , L_{\ell }}\) of ground terms and/or 1-patterns over the input alphabet \(\varSigma \). Those automata are used for enriching the equation with additional constraints of the form \(\alpha \in L_i\), meaning that the substitution for the variable \(\alpha \) should be from language \(L_i\). Naturally, those languages have to be specified in some way, we choose one, see Sect. 8.2, but other natural descriptions are equivalent, see discussion at the end of that section.

Context unification with regular constraints was investigated mostly because linear second order unification [12] and context unification with regular constraints reduce to each other [16]; note that those reductions are not polynomial-time, so cannot be used directly to claim the computational complexity of linear second order unification. On the other hand, adding constraints to unification is interesting and important on its own.

To generalise to this setting, we assign to each letter in the alphabet its transition function; such transition functions can be generalised to patterns, so in particular to substitutions for variables; transition vectors and vectors of such transitions are defined in Sect. 8.1 and we explain how to use them to define constraints for context unification in Sect. 8.2. When we compress a certain subpattern into one letter, we compose those transition functions. When we pop letters from variables we assign to the variable a new transition function, so that the composition of transition function of popped letters and the new transition function for a variable is equal to the old transition function for a variable.

However, several simplifications do not work in this setting. This is for a reason: it is known that the non-emptiness problem for intersection of (deterministic) finite tree automata is EXPTIME-complete [5] and we can easily encode this problem within context unification with regular constraints, so we cannot hope to extend our algorithm to this setting without affecting the computational complexity.

To resolve this problem, we extend the input alphabet by adding to it one letter \(f_\varDelta \) for every possible vector of transition functions \(\varDelta \) (we limit the allowed arities, though, to the maximal arity of letters in the input alphabet). We do not store this alphabet explicitly, instead we use an (EXPTIME) oracle to decide, whether a letter belongs to the alphabet or not. It is easy to see that a context equation is equisatisfiable over its input alphabet and over such an extended alphabet. Later on, for any equation, we consider its solution over an alphabet consisting of the extended alphabet and letters present in the equation.

Ultimately, for algorithms and which are generalisations of and to scenarios with regular constraints, respectively, we want to show the following theorem.

Theorem 3

and keep an equation of size \(\mathcal {O}(n^2k^2)\). Given an unsatisfiable context equation with regular constraints they never return ‘YES’. Given a satisfiable one with a minimal-size solution of size N they return ‘YES’ after \(\mathcal {O}(n^2k^2 \log N)\) compression steps, for appropriate non-deterministic choices.

  uses an oracle for the intersection of tree-regular languages, which can always be implemented in EXPTIME. Except for that, it runs in PSPACE.

As a corollary, we get an EXPTIME bound on the satisfiability of context unification with regular constraints.

Corollary 1

Context unification with regular constraints is EXPTIME-complete.

8.1 Tree Automata

A tree automaton is defined as a triple \((\varSigma ,Q,{\delta _f}_{f \in \varSigma })\), where \(\varSigma \) is a ranked alphabet, Q is a set of states and each \(\delta _f\) is a transition relation (sometimes called transition function for historical reasons) of a letter from \(\varSigma \). To be more precise, when then \(\delta _f \subseteq Q^r \times Q\). The meaning of the transition functions is that we consider labelling of ground terms with states (such labellings are usually called runs) such that a node labelled with f whose children are assigned states \((q_1, q_2, \ldots , q_r)\) can be assigned any state q such that \((q_1, q_2, \ldots , q_r,q) \in \delta _f\). Thus we think of \(\delta _f\) as a nondeterministic transition function, and by we denote the set of states .

We can treat a letter f as a pattern with a unique non-parameter node f; in this way we can define \(\delta _p\) for arbitrary patterns: given an r-pattern p the tuple is in \(\delta _p\) if and only if there is a run for p in which nodes are labelled with and the root of p is labelled with q. Note that this implicitly gives a rule of composing transition functions; such composition is associative in case of 1-patterns.

Concerning the notation, we will explicitly compose only transition functions for patterns that occur during the subpattern compression. Thus for unary patterns \(p_1, p_2, \ldots , p_\ell \) by \(\delta _{p_1}\delta _{p_2}\dots \delta _{p_\ell }\) we denote the transition function of the pattern \(p_1p_2\dots p_\ell \) and when \(p_1 = p_2 = \dots = p_\ell \) then we denote this function by \((\delta _{p_1})^\ell \). Similarly, for an r-ary pattern f and ground terms \(t_1, t_2, \ldots , t_m\) by \(\delta _f[{\bullet }_{i_1}/\delta _{t_1}, {\bullet }_{i_2}/\delta _{t_2}, \ldots , {\bullet }_{i_m}/\delta _{t_m}]\) we denote the transition function of a pattern obtained by replacing \({\bullet }_{i_1}, {\bullet }_{i_2}, \ldots , {\bullet }_{i_m}\) by \(t_1, t_2, \ldots , t_m\), respectively. Note, that it could be that \(m < r\), i.e. not all parameters of a pattern f are substituted by ground terms.

For a fixed sequence of automata \(A_1, A_2, \ldots , A_\ell \) (and it is fixed for a fixed instance of context unification with regular constraints) with transition functions \(\{\delta _f^1\}_{f\in \varSigma },\{\delta _f^2\}_{f\in \varSigma },\ldots , \{\delta _f^\ell \}_{f\in \varSigma }\) by \(\varDelta _p\) we denote the tuple of transition functions \((\delta ^1_p,\delta ^2_p,\ldots , \delta ^\ell _p)\) for a pattern p; this is a vector of transitions of this pattern. Note that this is a vector of sets. We denote vectors of transitions by \(\varDelta , \varDelta _1, \ldots \) and consider them even without underlying patterns, and refer to r-vector of transitions, when this is a vector of transitions of an r-ary pattern.

We extend the composition of transition functions and its notation to vectors of transitions in a natural way, i.e. we perform the appropriate operation coordinate-wise on each transition function.

8.2 Context Unification with Regular Constraints

Now we are ready to define the problem of context unification with regular constraints. As an input we are given a finite alphabet \(\varSigma \), finite automata \(A_1, A_2, \ldots , A_\ell \) over \(\varSigma \) (with state sets \(Q_1, Q_2, \ldots , Q_\ell \) and transition functions \(\{\delta ^1_f\}_{f \in \varSigma }, \{\delta ^2_f\}_{f \in \varSigma }, \ldots , \{\delta ^\ell _f\}_{f \in \varSigma }\)), a context equation \(u = v\) and a set of constraints on the vectors of transitions for variables u in total. To be more precise, those constraints are:

  • term variable constraints: we are given 0-vectors of transitions \(\varDelta _x\) for some variables \(x \in \mathcal X \);

  • equations constraints: similarly, we are given 0-vector of transition \(\varDelta _u\);

  • context variable constraints: we are given 1-vectors of transitions \(\varDelta _X\) for some context variables \(X \in \mathcal V \).

The meaning of the constraints is clear: we ask, whether there is a substitution \(S\), such that \(S(u) = S(v) \), \(\varDelta _{S(u)} = \varDelta _u\) and \(\varDelta _{S(\alpha )} = \varDelta _\alpha \) for each variable \(\alpha \).

8.3 Modifications of

We now explain the modifications of to , i.e. consider the algorithm that enriches the alphabet with every letter that it created.

Compression. When a subpattern p is compressed into f, we calculate its vector of transitions and set \(\varDelta _f \leftarrow \varDelta _p\).

Popping Letters. When popping letters, we guess the new vectors of transitions for the variable, so that the composition of vectors of transitions (in the appropriate order) of the popped letter and variable is the same as it used to be; this applies also to popping of term variables during the uncrossing of leaf-pair. For instance, when we replace X with \(X(f(x_1,{\bullet }_2,x_3))\) then we guess new transitions \(\varDelta _X', \varDelta _{x_1}, \varDelta _{x_3}\), such that \(\varDelta _X = \varDelta _X' (\varDelta _f[{\bullet }_1/\varDelta _{x_1},{\bullet }_3/\varDelta _{x_3}])\); we add \(\varDelta _{x_1}, \varDelta _{x_3}\) to sets constraints and update \(\varDelta _X\) to \(\varDelta _X'\). When we remove a context variable X, we need to ensure that its transition function \(\varDelta _X\) is the same as \(\varDelta ({\bullet })\), i.e. it is an identity; similarly, when we replace x with c we need to validate that \(\varDelta _x = \varDelta _c\).

Ending. When the whole equation is reduced to a single equation \(c = c\), we check whether the transition function for c is the same as for the whole equation, i.e. \(\varDelta _u = \varDelta _c\). If so, we accept, if not, we reject.

Satisfiability. Whenever we claim that an equation is satisfiable (so, it has some solution \(S\)), we need to additionally assert that the transition for a variable (and the whole equation) is as in the constraints kept by the instance, that is, \(\varDelta _{S(\alpha )} = \varDelta _\alpha \) for each variable \(\alpha \) and \(\varDelta _{S(u)} = \varDelta _u\).

Subprocedures. Lemma 1 holds in the new setting, to this end it is enough to recall that during compression the new letter has the same transition function as the pattern it replaced and for popping, we always guess the popped letters and the new constraints of variables so that the composition of their vectors of transitions is equal to the vector of transitions of the variable before the popping.

The discussion above shows the proof of Theorem 3 in case of . The only remaining problem is that the alphabet used by grows and the size of transition vectors of the involved letters can be even exponential. However, careful inspection shows that one can define appropriate subclass of all vectors of transitions, called reachable. They are of polynomial size and can be composed in polynomial time; moreover, each letter that occurs during has a reachable vector of transition and vice versa—each reachable vector of transitions can be realised by tree or a pattern over the input alphabet. Lastly, one can check in EXPTIME, whether a vector of transitions is reachable. This ends the analysis for Theorem 3.

9 Open Questions

  • Computational complexity. The exact computational complexity of context unification remains unknown: the presented algorithm shows containment in PSPACE and the best known lower bound is NP, by a trivial reduction of Integer Programming. Perhaps the additional structure of terms allows showing a stronger lower bound?

  • Size of minimal solutions. Extension of the given proof shows that the size of the smallest solution of a context unification is of at most doubly exponential size. At the same time, we know no solution which is super-exponential, so the same as in the case of word equations. An exponential upper bound would imply containment in NP, a counterexample would somehow suggest that PSPACE is the computational complexity of the problem.

  • Unary second order unification. The decidability status of subproblem of second order unification, in which each second order has arity 1, remains unknown. The presented approach does not generalize to this case and at the same time the existing proof of undecidability essentially requires second-order variables of rank 2.

  • One context variable. Context unification with one context variable is known to be in NP  [6] and some of its fragments are in P [7, 8]. It remains an open question, whether the whole problem is in P.