1 Introduction

P\(\rho \)Log is a tool that combines, on the one hand, the power of logic programming and, on the other hand, the flexibility of strategy-based conditional transformation systems. Its terms are built over function symbols without fixed arity, using four different kinds of variables: for individual terms, for sequences of terms, for function symbols, and for contexts. These variables help to traverse tree forms of expressions both in horizontal and vertical directions, in one or more steps. A powerful matching algorithm helps to replace several steps of recursive computations by pattern matching, which facilitates writing short and intuitively quite clear code. By the backtracking engine, nondeterministic computations are modeled naturally. Prolog’s meta-programming capabilities allowed to easily write a compiler from P\(\rho \)Log programs (that consist of a specific Prolog code, actually) into pure Prolog programs.

P\(\rho \)Log program clauses either define user-constructed strategies by transformation rules or are ordinary Prolog clauses. Prolog code can be used freely within P\(\rho \)Log programs, which is especially convenient when some built-in primitives, arithmetic calculations, or input-output features are needed.

P\(\rho \)Log is based on the \(\rho \)Log calculus [17] and, essentially, is its executable implementation. The inference system of the calculus is basically the SLDNF-resolution, with normal logic program semantics [15]. Therefore, Prolog was a natural choice to implement it.

Originally, the \(\rho \)Log calculus evolved from experiments with extending the language of Mathematica  [24] by a package for advanced rule-based programming  [16, 18]. Later, these developments influenced an extension of another symbolic computation system, Maple [19], by a rule-based programming package called symbtrans (an adaptation of \(\rho \)Log) used for automatic derivation of multiscale models of arrays of micro- and nanosystems, see, e.g., [2].

The \(\rho \)Log calculus has been influenced by the \(\rho \)-calculus [5], which, in itself, is a foundation for the rule-based programming system ELAN [3]. There are some other languages for programming by rules, such as ASF-SDF  [21], CHR [12], Claire [4], Maude [6], Stratego [22], Tom [1], just to name a few. The \(\rho \)Log calculus and, consequently, P\(\rho \)Log differs from them, first of all, by its pattern matching capabilities. Besides, it adopts logic programming semantics (clauses are first class concepts, rules/strategies are expressed as clauses) and makes a heavy use of strategies to control transformations. In earlier papers, we showed its applicability for XML transformation and Web reasoning [7], and in modeling rewriting strategies [10]. More recently, it has been used in extraction of frequent patterns from data mining workflows [20].

The mentioned application papers, naturally, described the language and some features of P\(\rho \)Log, but they did not give an overview of the entire system. Moreover, there have been some new developments meanwhile: the library of built-in strategies has been modified and extended, a lighter version of P\(\rho \)Log has been implemented, an Emacs-based development environment appeared. Therefore, we decided to describe the current status of the tool in this paper: to explain by simple examples how it works, discuss the language, architecture, built-in strategies, and the development environment.

P\(\rho \)Log sources, Emacs mode, and help on built-in strategies can be downloaded from its Web page

http://www.risc.jku.at/people/tkutsia/software/prholog/.

The current version has been tested for SWI-Prolog [23] version 7.2.3 or later.

2 Overview

P\(\rho \)Log atoms are supposed to transform term sequences. Transformations are labeled by what we call strategies. Such labels (which themselves can be compound terms, not necessarily constant symbols) help to construct more complex transformations from simpler ones.

An instance of a transformation is finding duplicated elements in a sequence and removing one of them. Let us call this process double merging. The following strategy implements the idea:

$$\begin{aligned} \mathrm{{merge\_doubles}}\,{::}\,( s\_X ,\, i\_x ,\, s\_Y ,\, i\_x ,\, s\_Z ) \Longrightarrow ( s\_X ,\, i\_x ,\, s\_Y ,\, s\_Z ). \end{aligned}$$

The code, as one can see, is pretty short. merge_doubles is the strategy name. It is followed by the separator  :  :  which separates the strategy name from the transformation. Then comes the transformation itself in the form \( lhs \Longrightarrow rhs \). It says that if the sequence in \( lhs \) contains duplicates (expressed by two copies of the variable \( i\_x \), which can match individual terms and therefore, is called an individual variable) somewhere, then from these two copies only the first one should be kept in \( rhs \). That “somewhere” is expressed by three sequence variables, where \( s\_X \) stands for the subsequence before the first occurrence of \( i\_x \), \( s\_Y \) takes the subsequence between two occurrences of \( i\_x \), and \( s\_Z \) matches the remaining part. These subsequences remain unchanged in the \( rhs \). In P\(\rho \)Log, variable names start with the first letter of their kind (there are four kinds of variables: individual, sequence, function, context), followed by the underscore. After the underscore, there comes the actual name. For anonymous variables, we write just \( i\_ ,\) \( s\_ ,\) \( f\_ ,\) \( c\_ \).

Note that one does not need to code the actual search process of doubles explicitly. The matching algorithm does the job instead, looking for an appropriate instantiation of the variables. There can be several such instantiations.

Now one can ask, e.g., to merge doubles in a number sequence \((1,\,2,\,3,\,2,\,1)\):

$$\begin{aligned} \text {?- } \mathrm{{merge\_doubles}} \,{::}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Result . \end{aligned}$$

First, P\(\rho \)Log returns the substitution \(\{ s\_Result \mapsto (1,\, 2,\, 3,\, 2) \}\). Like in Prolog, the user may ask for more solutions, and, via backtracking, P\(\rho \)Log gives the second answer \(\{ s\_Result \mapsto (1,\, 2,\, 3,\, 1) \}\). Both are obtained from \((1,\,2,\,3,\,2,\,1)\) by merging one pair of duplicates.

A double-free sequence is just a normal form of this single-step \(\mathrm{{merge\_doubles}}\) transformation. P\(\rho \)Log has a built-in strategy for computing normal forms, denoted by \(\mathbf{{nf}}\), and we can use it to define a new strategy \(\mathrm{{merge\_all\_doubles}}\) in the following clause (where \(\mathop {\text {:-}}\), as in Prolog, stands for the inverse implication):

$$\begin{aligned} \mathrm{{merge\_all\_doubles}} \,{::}\, s\_X \Longrightarrow s\_Y \ \mathop {\text {:-}}\ \mathbf{{nf}}(\mathrm{{merge\_doubles}}) \,{::}\, s\_X \Longrightarrow s\_Y ,\ !. \end{aligned}$$

The effect of \(\mathbf{{nf}}\) is that it applies \(\mathrm{{merge\_doubles}}\) to \( s\_X \), repeating this process iteratively as long as it is possible, i.e., as long as doubles can be merged in the obtained sequences. When \(\mathrm{{merge\_doubles}}\) is no more applicable, it means that the normal form of the transformation is reached. It is returned in \( s\_Y \).

Note the Prolog cut at the end. It cuts the alternative ways of computing the same normal form. In fact, Prolog primitives and clauses can be used in P\(\rho \)Log programs. Now, for the query

$$\begin{aligned} \text {?- } \mathrm{{merge\_all\_doubles}} \,{::}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Result . \end{aligned}$$

we get a single answer \( s\_Result \mapsto (1,\,2,\,3)\).

Instead of the cut, we could define \(\mathrm{{merge\_all\_doubles}}\) purely in P\(\rho \)Log terms:

$$\begin{aligned}&\mathrm{{merge\_all\_doubles}} \,{::}\, s\_X \Longrightarrow s\_Y \ \mathop {\text {:-}}\\&\qquad \mathbf{{first\_one}}(\mathbf{{nf}}(\mathrm{{merge\_doubles}})) \,{::}\, s\_X \Longrightarrow s\_Y . \end{aligned}$$

\(\mathbf{{first\_one}}\) is another P\(\rho \)Log built-in strategy. It applies to a sequence of strategies, finds the first one among them, which successfully transforms the input sequence, and gives back just one result of the transformation. Here it has a single argument strategy \(\mathbf{{nf}}(\mathrm{{merge\_doubles}})\) and returns (by instantiating \( s\_Y \)) only one result of its application to \( s\_X \).

In the last clause, the transformation is exactly the same in the clause head and in the (singleton) body, and both have sequence variables in the left and right hand sides (\( s\_X \) and \( s\_Y \)). In such cases we can use more succinct notation:

$$\begin{aligned} \mathrm{{merge\_all\_doubles}}\ \mathop {\text {:=}}\ \mathbf{{first\_one}}(\mathbf{{nf}}(\mathrm{{merge\_doubles}})). \end{aligned}$$

This form is called the strategy definition form: the strategy in its left hand side (here \(\mathrm{{merge\_all\_doubles}}\)) is defined as the strategy in its right hand side (here \(\mathbf{{first\_one}}(\mathbf{{nf}}(\mathrm{{merge\_doubles}}))\)).

P\(\rho \)Log is good not only in selecting arbitrarily many subexpressions in “horizontal direction” (by sequence variables), but also in working in “vertical direction”, selecting subterms at arbitrary depth. Context variables provide this flexibility, by matching the context above the subterm to be selected. A context is a term with a single “hole” in it. When it applies to a term, the latter is “plugged in” the hole, replacing it. Syntactically, the hole is denoted by a special constant. In the P\(\rho \)Log system it is hole, but here in the paper we use a more conventional notation \(\bullet \). There is yet another kind of variable, called function variable, which stands for a function symbol. With the help of these constructs and the \(\mathrm{{merge\_doubles}}\) strategy, it is pretty easy to define a transformation that merges double branches in a tree, represented as a term:

$$\begin{aligned}&\mathrm{{merge\_double\_branches}}\, {:}{:}\\&\qquad c\_Context ( f\_Fun ( s\_X )) \Longrightarrow c\_Context ( f\_Fun ( s\_Y )) \ \mathop {\text {:-}}\\&\qquad \mathrm{{merge\_doubles}} \,{:}{:}\, s\_X \Longrightarrow s\_Y . \end{aligned}$$

Here \( c\_Context \) is a context variable and \( f\_Fun \) is a function variable. Now, we can ask to merge double branches in a given tree:

$$\begin{aligned}&\text {?- } \mathrm{{merge\_double\_branches}} \,{:}{:} \\&\qquad f(g(a,b,a,h(c,c)),\, h(c),\, g(a,a,b,h(c))) \Longrightarrow i\_Result . \end{aligned}$$

P\(\rho \)Log returns three results, one after the other, by backtracking:

$$\begin{aligned}&\{ i\_Result \mapsto f(g(a, b, h(c, c)),\, h(c),\, g(a, a, b, h(c))) \},\\&\{ i\_Result \mapsto f(g(a, b, a, h(c)),\, h(c),\, g(a, a, b, h(c))) \},\\&\{ i\_Result \mapsto f(g(a, b, a, h(c, c)),\, h(c),\, g(a, b, h(c))) \}. \end{aligned}$$

To obtain the first one, P\(\rho \)Log matched the context variable \( c\_Context \) to the context \(f(\bullet ,\, h(c),g(a, a, b, h(c)))\), the function variable \( f\_Fun \) to the function symbol g, and the sequence variable \( s\_X \) to the sequence (abah(cc)). \( \mathrm{{merge\_doubles}}\) transformed (abah(cc)) to (abh(cc)). The other results have been obtained by taking different contexts and respective subbranches.

The right hand side of transformations in the queries need not be variables. One can have an arbitrary sequence there. For instance, we may be interested in trees that contain h(cc):

$$\begin{aligned}&\text {?- } \mathrm{{merge\_double\_branches}} \,{:}{:} \\&\qquad f(g(a,b,a,h(c,c)),\, h(c),\, g(a,a,b,h(c))) \Longrightarrow c\_C (h(c,c)). \end{aligned}$$

We get here two answers, which show instantiations of \( c\_C \) by the relevant contexts:

$$\begin{aligned}&\{ c\_C \mapsto f(g(a, b, \bullet ),\, h(c),\, g(a, a, b, h(c))) \}, \\&\{ c\_C \mapsto f(g(a, b, a, \bullet ),\, h(c),\, g(a, b, h(c))) \}. \end{aligned}$$

Similar to merging all doubles in a sequence above, we can also define a strategy that merges all identical branches in a tree repeatedly. It is not surprising that the built-in strategy for normal forms plays a role also here:

$$\begin{aligned}&\mathrm{{merge\_all\_double\_branches}}\ \mathop {\text {:=}}\ \mathbf{{first\_one}}(\mathbf{{nf}}(\mathrm{{merge\_double\_branches}})). \end{aligned}$$

For the query

$$\begin{aligned}&\text {?- } \mathrm{{merge\_all\_double\_branches}} \,{:}{:} \\&\qquad f(g(a,b,a,h(c,c)),\, h(c),\, g(a,a,b,h(c))) \Longrightarrow s\_Result . \end{aligned}$$

we get a single answer \(\{ s\_Result \mapsto f(g(a, b, h(c)), h(c)) \}\).

Finally, note that a strategy can be defined by several clauses, which are treated as alternatives.

3 The P\(\rho \)Log Language

From the brief overview above one can get a pretty clear idea about the P\(\rho \)Log language: Its terms and sequences are constructed from function symbols that do not have fixed arity (variadic, aka unranked, function symbols), using the four kinds of variables. The constant hole is the exception: it is always used without arguments. More precisely, terms are either individual variables, or expressions of one of the following forms: \(f(\tilde{s})\), \( f\_F (\tilde{s})\), or \( c\_C (t)\), where f is an unranked function symbol, t is a term, and \(\tilde{s}\) is a finite (possibly empty) sequence of terms or sequence variables. (These sequences are sometimes called hedges.) The empty sequence is denoted in the system with \(\text{ eps }\) (for \(\epsilon \)), but we use more conventional notation \((\,)\) in the paper. Two sequences can be concatenated into one, where the empty sequence plays the role of the unit element of this (meta-level) concatenation operation. Sequences are written in the parenthesis for easy parsing (when they contain more than one element) and are flat. A singleton sequence is identified with its sole element. Contexts are terms with a unique occurrence of the hole. The previous section contains several examples of terms, sequences, and contexts.

Substitutions map individual variables to terms, sequence variables to sequences, function variables to function symbols or function variables, and context variables to contexts. For example, \(\{ c\_Ctx \mapsto f(\bullet ),\, i\_Term \mapsto g( s\_X ),\, f\_Fun \mapsto g,\, s\_H _1 \mapsto (\,),\, s\_H_2 \mapsto (b, c)\}\) is a substitution. We can apply substitutions to sequences, which gives sequences as a result. In particular, if the sequence is a singleton term, then the result of the application is also a term. Applying the substitution above to a sequence \(( c\_Ctx ( i\_Term ),\, f\_Fun ( s\_H_1 ,a, s\_H_2 ))\) give the sequence \((f(g( s\_X )), g(a,b,c))\).

Note that sequence variables are not terms, and context variables always apply to terms, not to arbitrary sequences. This makes terms and contexts closed under substitution application.

The main computational mechanism for P\(\rho \)Log is matching. Due to sequence and context variables, it is finitary, which means that a matching problem may have finitely many solutions. For instance, the sequence \(( s\_X ,\, i\_x ,\, s\_Y , i\_x ,\, s\_Z )\) matches \((1,\,2,\,3,\,2,\,1)\) in two different ways:

  • \(\{ s\_X \mapsto (\,),\, i\_x \mapsto 1,\, s\_Y \mapsto (2,3,2),\, s\_Z \mapsto (\,)\}\),

  • \(\{ s\_X \mapsto 1,\, i\_x \mapsto 2,\, s\_Y \mapsto 3,\, s\_Z \mapsto 1\}\).

In the previous section, we also saw two solutions to the problem of matching \( c\_C (h(c,c))\) to the result of applying the strategy merge_double_branches to the term \(f(g(a,b,a,h(c,c)),h(c),\, g(a,a,b,h(c)))\).

A \(\rho \)Log atom (\(\rho \)-atom) is a triple consisting of a strategy \( st \) (which is a term) and two (hole-free) sequences \(\tilde{s}_1\) and \(\tilde{s}_2\), written as \( st \,{:}{:}\, \tilde{s}_1 \Longrightarrow \tilde{s}_2\). Its negation is written as . A \(\rho \)Log literal (\(\rho \)-literal) is a \(\rho \)-atom or its negation. A P\(\rho \)Log clause is either a Prolog clause, or a clause of the form \( st \,{:}{:}\, \tilde{s}_1 \Longrightarrow \tilde{s}_2 \ \mathop {\text {:-}}\ body \) (called a \(\rho \)-clause) where body is a (possibly empty) conjunction of \(\rho \)- and Prolog literals. Strategy definitions \( str _1 := str _2\) are shortcuts for clauses of the form \( str _1 \,{:}{:}\, s\_X \Longrightarrow s\_Y \mathop {\text {:-}} str _2 \,{:}{:}\, s\_X \Longrightarrow s\_Y \).

In fact, P\(\rho \)Log clauses may have a more complex structure, when (some of) the literals are equipped with membership constraints, constraining possible values of sequence and context variables. Such constraints are taken into account in the matching process. For simplicity, we do not consider them in this paper.

A P\(\rho \)Log program is a sequence of P\(\rho \)Log clauses. A query is a conjunction of \(\rho \)- and Prolog literals. A restriction on variable occurrence is imposed on clauses: \(\rho \)-clauses and queries can contain only P\(\rho \)Log variables, while Prolog clauses and queries can contain only Prolog variables. If a \(\rho \)-clause or a query contains a Prolog literal, the only variables that can occur in that literal are P\(\rho \)Log individual variables. (When it comes to evaluating such Prolog literals, the individual variables are converted into Prolog variables.) A detailed description of P\(\rho \)Log syntax can be found in the technical report [11] and on its Web page.

We need to make sure that in the program execution process, all solving problems that arise for P\(\rho \)Log clauses and queries are matching problems, not unification. The reason is that matching for our language is finitary [14], while unification is infinitary  [8, 13]. The latter is undesirable, because it would cause infinite branching in the program execution tree. Therefore, we would like to restrict the solving to the fragment that guarantees an existence of a terminating finitary procedure. Matching is one of such possible fragments. The restriction we impose on clauses and queries is well-modedness, extends the same notion for logic programs, introduced in [9]. It forbids uninstantiated variables to appear in one of the sides of unification problems and, hence, only matching problems arise.

More specifically, well-modedness is based on the notion of mode of a relation. A mode for the relation \(\cdot \, \,{:}{:}\, \cdot \, \Longrightarrow \,\cdot \) is a function that defines the input and output positions of the relation respectively as \( in (\cdot \, \,{:}{:}\, \cdot \, \Longrightarrow \,\cdot )=\{1,2\}\) and \( out (\cdot \, \,{:}{:}\, \cdot \, \Longrightarrow \,\cdot )=\{3\}\). A mode is defined (uniquely) for a Prolog relation as well. A clause is moded if all its predicate symbols are moded. We assume that all \(\rho \)-clauses are moded. As for the Prolog clauses, we require modedness only for those ones that define a predicate that occurs in the body of some \(\rho \)-clause. If a Prolog literal occurs in a query in conjunction with a \(\rho \)-clause, then its relation and the clauses that define this relation are also assumed to be moded.

Roughly, the idea of well-modedness is that the variables in the input positions should already be seen in the output positions of some earlier literals. Before defining it formally, we introduce the notation \( vars (E)\) for a set of variables occurring in an expression E, and define \( vars (E,\{p_1,...,p_n\})=\cup _{i=1}^n vars (E|_{p_i})\), where \(E|_{p_i}\) is the standard notation for a subexpression of E at position \(p_i\). The symbol \(\mathcal {V}_\mathsf{a}\) stands for the set of anonymous variables. A ground expression contains no variables. Then well-moded queries and clauses are defined as follows:

Definition 1

A query \(L_1,\ldots ,L_n\) is well-moded iff the following conditions hold for each \(1\le i\le n\):

  • \( vars (L_i, in (L_i))\subseteq \cup _{j=1}^{i-1} vars (L_j, out (L_j))\setminus \mathcal {V}_\mathsf{a}.\)

  • If \( L_i\) is a negative literal, then \( vars ({ L_i}, out ({ L_i}))\subseteq \cup _{j=1}^{i-1} vars ({ L_j}, out ({ L_j}))\cup \mathcal {V}_\mathsf{a}.\)

  • If \( L_i\) is a \(\rho \)-literal, then its strategy term is ground.

A clause \( L_0\, \mathop {\text {:-}}\, L_1,\ldots ,L_n\) is well-moded iff the following hold for each \(1\le i\le n\):

  • \( vars ({ L_i}, in ({ L_i}))\cup vars ({ L_0}, out ({ L_0}))\subseteq \cup _{j=0}^{i-1} vars ({ L_j}, out ({ L_j}))\setminus \mathcal {V}_\mathsf{a}.\)

  • If \( L_i\) is a negative literal, then

    $$\begin{aligned} vars ({ L_i}, out ({ L_i}))\subseteq \cup _{j=1}^{i-1} vars ({ L_j}, out ({ L_j}))\cup \mathcal {V}_\mathsf{a} \cup vars ({ L_0}, in ({ L_0})). \end{aligned}$$
  • If \( L_0\) and \({ L_i}\) are \(\rho \)-literals with the strategy terms \(st_0\) and \( st_i\), respectively, then \( vars ({ st_i})\subseteq vars ({ st_0})\).

It is easy to see that the clauses and queries in Figs. 1 and 2 are well-moded.

P\(\rho \)Log prologwell-moded. Well-modedness of queries is checked when they are evaluated. There is no restriction on the Prolog clauses if the predicate they define is not used in a \(\rho \)-clause.

P\(\rho \)Log execution principle is based on depth-first inference with leftmost literal selection in the goal. If the selected literal is a Prolog literal, then it is evaluated in the standard way. If it is a \(\rho \)-atom of the form \( st \,{:}{:}\, \tilde{s}_1 \Longrightarrow \tilde{s}_2\), the crucial thing is that, due to well-modedness, \( st \) and \(\tilde{s}_1\) do not contain variables. Then a (renamed copy of a) program clause \( st' \,{:}{:}\, \tilde{s}'_1 \Longrightarrow \tilde{s}'_2\, \mathop {\text {:-}}\, body \) is selected, such that \( st' \) matches \( st \) and \(\tilde{s}'_1\) matches \(\tilde{s}_1\) with a substitution \(\sigma \). Next, the selected literal in the query is replaced with the conjunction \(( body )\sigma ,\, \mathbf{{id}}\,{:}{:}\, \tilde{s}'_2\sigma \Longrightarrow \tilde{s}_2 \), where \(\mathbf{{id}}\) is the built-in strategy for identity: it succeeds iff the \( rhs \) matches the \( lhs \). Evaluation continues further with this new query. Success and failure are defined in the standard way. Backtracking allows to explore other alternatives that may come from matching the selected query literal to the head of the same program clause in a different way, or to the head of another program clause. Negative literals are processed by the negation-as-failure rule. Well-modedness guarantees that whenever a negative \(\rho \)-literal is selected during the execution process, there are no variables in it except, maybe, some anonymous variables that may occur in its right-hand side.

Example 1

To illustrate the described P\(\rho \)Log inference step, consider again the example about \(\mathrm{{merge\_all\_doubles}}\) from Sect. 2. To transform the query

$$\begin{aligned} \text {?- } \mathrm{{merge\_all\_doubles}} \,{:}{:}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Result . \end{aligned}$$

with the clause

$$\begin{aligned} \mathrm{{merge\_all\_doubles}} \,{:}{:}\, s\_X \Longrightarrow s\_Y \ \mathop {\text {:-}}\ \mathbf{{nf}}(\mathrm{{merge\_doubles}}) \,{:}{:}\, s\_X \Longrightarrow s\_Y ,\ !., \end{aligned}$$

P\(\rho \)Log takes the matcher \(\{ s\_X \mapsto (1,\,2,\,3,\,2,\,1) \}\), and produces a new query

$$\begin{aligned} \text {?- } \mathbf{{nf}}(\mathrm{{merge\_doubles}}) \,{:}{:}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Y ,\ !,\ \mathbf{{id}}\,{:}{:}\, s\_Y \Longrightarrow s\_Result . \end{aligned}$$

4 Implementation

P\(\rho \)Log is implemented in SWI-Prolog. Its programs have the extension .rho. In Fig. 1 one can see how exactly the program merge.rho for merging doubles in sequences and trees, discussed in Sect. 2, looks.

Fig. 1.
figure 1

Program merge.rho for merging doubles in sequences and trees.

P\(\rho \)Log variables are, actually, Prolog constants. Therefore, one can not directly rely on Prolog unification to compute values for those variables. Consequently, the answers to the query should be computed as explicit substitutions showing what P\(\rho \)Log variables map to. It requires a P\(\rho \)Log query to be actually wrapped to a meta-query that then returns the substitutions. For the queries considered in Sect. 2, such meta-queries can be seen in Fig. 2. The predicate symbol used in them is ?.

Fig. 2.
figure 2

Querying merge.rho.

The substitutions indicate that there is a background solving mechanism in P\(\rho \)Log that performs matching and computes the corresponding substitutions. Indeed, we do it by the algorithm from [14], implemented in SWI-Prolog. However, it turns out that if we do not have context variables, then we can avoid using this implementation and, instead, compute matching substitutions directly by Prolog unification, which is, naturally, a more efficient way. We have implemented this version of P\(\rho \)Log as well, calling it P\(\rho \)Log-light. To distinguish, we sometimes say P\(\rho \)Log-full for the version with context variables.

The P\(\rho \)Log distribution consists exactly of these two parts: P\(\rho \)Log-full and P\(\rho \)Log-light. Each part has the main file, called prholog.pl and prholog-l.pl, respectively. They are responsible for setting up the environments and loading the corresponding version of P\(\rho \)Log. The major parts of both versions are the parser, compiler, and the library of built-in strategies: parse.pl, compile.pl, library.pl files for P\(\rho \)Log-full, and parse-l.pl, compile-l.pl, library-l.pl files for P\(\rho \)Log-light, respectively.

Besides, in the full P\(\rho \)Log there is a solver solve.pl for matching problems and regular constraints. The light version does not require such a solver, but it still needs to check regular constraints. It is done in the file constraints-l.pl.

A typical P\(\rho \)Log session starts by invoking SWI-Prolog and consulting the main P\(\rho \)Log file. After that, the user may write/edit a .rho file in her favorite editor, and load it by executing the query ?- load(‘...filename.rho’), where ... stands for the full path. Next, the program can be queried as, e.g., it is shown in Fig. 2.

The parser and the compiler are invoked at the time when a .rho file is loaded. Besides syntax errors, the parser checks also for well-modedness and for occurrences of P\(\rho \)Log variables in Prolog literals. If no errors are detected, then the compiler compiles the filename.rho file into a Prolog file filename.pl, translating each P\(\rho \)Log clause into a Prolog clause. The file filename.pl is located in the same directory as filename.rho, loads immediately after the compilation, and is deleted on the exit.

The same parsing and compiling process is done when P\(\rho \)Log queries are evaluated. After compiling, the obtained Prolog query is executed. Answers are given as explicit substitutions.

5 Library

The library consists of definitions of built-in strategies, implemented in Prolog. They greatly simplify programming in P\(\rho \)Log. These strategies are protected and can not be redefined from a P\(\rho \)Log program. Currently there are 14 of them in the library. Except a couple of exceptions, each of them can be used both with and without regular constraints. We give a brief overview of some of those strategies, without mentioning the constraints.

Choice. The syntax of this strategy is

$$\begin{aligned} \mathbf{{choice}}( strategy _1,\ldots , strategy _n) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2, \end{aligned}$$

where \(n\ge 1\). It succeeds if and only if for some i, \( strategy _i \,{:}{:}\, sequence _1 \Longrightarrow sequence _2\) succeeds.

Composition. Composing strategies, making the output sequence of one the input for the other:

$$\begin{aligned} \mathbf{{compose}}( strategy _1,\ldots , strategy _n) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2, \end{aligned}$$

where \(n\ge 2\). First applies \( strategy _1\) to \( sequence _1 \). To its result, \( strategy _2\) is applied and so on. \( sequence _2\) is the final result. compose fails if one of its argument strategies fails in the process.

Closure. The syntax of this strategy is:

$$\begin{aligned} \mathbf{{closure}}( strategy ) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2, \end{aligned}$$

It succeeds if \( sequence _2\) belongs to the closure set of transforming \( sequence _1\) by \( strategy \). The set elements are computed one after the other, by backtracking. \(\mathbf{{closure}}\) fails if the set is empty. An example of a query would be

$$\begin{aligned} ?- \, \mathbf{{closure}}( merge\_doubles ) \,{:}{:}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Result . \end{aligned}$$

It gives five answer substitutions via backtracking:

  • \(\{ s\_Result \mapsto (1,\,2,\,3,\,2,\,1)\}\),

  • \(\{ s\_Result \mapsto (1,\,2,\,3,\,2)\}\),

  • \(\{ s\_Result \mapsto (1,\,2,\,3)\}\),

  • \(\{ s\_Result \mapsto (1,\,2,\,3,\,1)\}\),

  • \(\{ s\_Result \mapsto (1,\,2,\,3)\}\)

Identity. The goal of this strategy is to transform a sequence to its identical one:

$$\begin{aligned} \mathbf{{id}} \,{:}{:}\, sequence _1 \Longrightarrow sequence _2. \end{aligned}$$

It succeeds iff \( sequence _2\) can match \( sequence _1\).

Returning all answers of the first applicable strategy, one by one. Denoted by \(\mathbf{{first\_all}}\):

$$\begin{aligned} \mathbf{{first\_all}}( strategy _1,\ldots , strategy _n) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2, \end{aligned}$$

where \(n\ge 1\). Tries to apply \( strategy _1\) to \( sequence _1\). If this fails, it tries the next strategy and so on. When a strategy is found that succeeds, \(\mathbf{{first\_all}}\) returns all answers computed by it in \( sequence _2\), via backtracking. If no strategy succeeds, \(\mathbf{{first\_all}}\) fails.

The strategy \(\mathbf{{first\_one}}\) mentioned earlier is similar to \(\mathbf{{first\_all}}\), with the only difference that it returns only one answer instead of all of them.

Returning all answers at once. It can be seen as an analog of findall for P\(\rho \)Log. The syntax is

$$\begin{aligned} \mathbf{{all\_answers}}( strategy ) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2. \end{aligned}$$

It succeeds if and only if \( sequence _2\) is a sequence consisting of terms of the form \( ans (\tilde{s}_1), \ldots , ans (\tilde{s}_n)\), where \(\tilde{s}_1,\ldots , \tilde{s}_n\) are all the sequences obtained by applying \( strategy \) to \( sequence _1\). The symbol \( ans \) just plays the role of a constructor, to distinguish between different answer sequences in \( sequence _2\). We could ask

$$\begin{aligned} ?- \, \mathbf{{all\_answers}}( merge\_doubles ) \,{:}{:}\, (1,\,2,\,3,\,2,\,1) \Longrightarrow s\_Result . \end{aligned}$$

and obtain the answer \(\{ s\_Result \mapsto ( ans (1,\,2,\,3,\,2), ans (1,\,2,\,3,\,1))\}\).

Interactive mode. The syntax is:

$$\begin{aligned} \mathbf{{interactive}} \,{:}{:}\, sequence _1 \Longrightarrow sequence _2. \end{aligned}$$

It activates the interactive mode and starts dialog with the user, asking her to provide a strategy, which is then applied to \( sequence _1\). The process is repeated further so that the output sequence of the previous strategy application becomes the input for the new strategy provided by the user, and so on. The interactive process stops when the user types finish. At that moment, the input sequence that was there is returned in \( sequence _2\). \(\mathbf{{interactive}}\) fails when the user-provided strategy fails for the current input sequence.

n-fold iteration. Specifies how many times a strategy can be applied repeatedly:

$$\begin{aligned} \mathbf{{iterate}}( strategy , n) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2. \end{aligned}$$

It applies \( strategy \) repeatedly, n times, starting from \( sequence _1\). The result is returned in \( sequence _2\). \(\mathbf{{iterate}}\) fails if one of the applications fails.

The normal form strategy \(\mathbf{{nf}}\) is similar, but instead of applying a strategy fixed number of times, it applies it until the transformation is not possible, and returns the last sequence.

Mapping a strategy to a sequence. Mapping is a common operation in declarative programming:

$$\begin{aligned} \mathbf{{map}}( strategy ) \,{:}{:}\, sequence _1 \Longrightarrow sequence _2. \end{aligned}$$

It applies \( strategy \) to each term of \( sequence _1\). For such an input term, \( strategy \) may, in general, return a sequence (not necessarily a single term). A sequence constructed of these results (in the same order) is then returned in \( sequence _2\). \(\mathbf{{map}}\) fails when the application of \( strategy \) to a term from \( sequence _1\) fails. When \( sequence _1\) is empty, \( sequence _2\) is empty as well.

A variation of this strategy, \(\mathbf{{map\_to\_subhedges}}\), splits \( sequence _1\) nondeterministically into nonempty subsequences (when \( sequence _1\) is not empty) and applies \( strategy \) to each such subsequence. A sequence constructed from these results (in the same order) is returned in \( sequence _2\). \(\mathbf{{map\_to\_subhedges}}\) fails when \( sequence _1\) can not split in such a way that the application of \( strategy \) succeeds for each split subsequence. When \( sequence _1\) is empty, so is \( sequence _2\).

Rewriting. Yet another common transformation, which transforms a term not necessarily in the top position, but by transforming its subterm, in general:

$$\begin{aligned} \mathbf{{rewrite}}( strategy ) \,{:}{:}\, term _1 \Longrightarrow term _2. \end{aligned}$$

It succeeds if and only if \( term _2\) is obtained from \( term _1\) by applying \( strategy \) to a subterm of it. Note that one can easily define rewriting inside full P\(\rho \)Log:

$$\begin{aligned}&rewrite ( i\_ Strategy ) \,{:}{:}\, c\_Context ( i\_ Term _1) \Longrightarrow c\_Context ( i\_ Term _2) \ \mathop {\text {:-}}\\&\quad i\_ Strategy \,{:}{:}\, i\_ Term _1 \Longrightarrow i\_ Term _2. \end{aligned}$$

Nevertheless, we decided to provide the predefined strategy for rewriting in the library, because it is quite a frequently used transformation.

6 Development Environment

P\(\rho \)Log can be used in any development environment that is suitable for SWI-Prolog. We provide a special Emacs mode for P\(\rho \)Log, which extends the Stefan D. Bruda’s Prolog mode for Emacs.Footnote 1 It supports syntax highlighting, makes it easy to load P\(\rho \)Log programs and anonymize variables via the menu, etc. Figure 3 can give an idea how it looks.

A tracing tool for P\(\rho \)Log is under development. Prolog trace is too fine-grained for this purpose, since it goes through all parsing and compilation predicates that are invoked when a P\(\rho \)Log query is evaluated. Instead, the P\(\rho \)Log-specific tracing/debugging tool should ignore (by default) all intermediate Prolog steps and show only those that are directly related to P\(\rho \)Log inference.

7 Discussion and Final Remarks

The main advantage of using P\(\rho \)Log is its flexibility in specifying nondeterministic computations, which allows to neatly combine conditional transformation rules with logic programming. Strategies help to separate transformation rules from the control on their application, which makes rules reusable in different transformations. It also means that, unlike Prolog, the user can apply the program clauses in different order for different queries, without rewriting the code.

Assume that we have two P\(\rho \)Log rules, one for the top-level transformation of a term, and the other one for transforming an argument:

$$\begin{aligned}&transform\_top ( i\_ Strategy ) \,{:}{:}\, i\_ Term _1 \Longrightarrow i\_ Term _2\ \mathop {\text {:-}}\\&\qquad i\_ Strategy \,{:}{:}\, i\_ Term _1 \Longrightarrow i\_ Term _2.\\&transform\_arg ( i\_ Strategy )\, {:}{:}\\&\qquad f\_Fun ( s\_X , i\_ Term _1, s\_Y ) \Longrightarrow f\_Fun ( s\_X , i\_ Term _2, s\_Y ) \ \mathop {\text {:-}}\\&\qquad i\_ Strategy \,{:}{:}\, i\_ Term _1 \Longrightarrow i\_ Term _2. \end{aligned}$$
Fig. 3.
figure 3

Emacs P\(\rho \)Log session.

Note that the use of function and sequence variables makes the code universal (it can apply to any term, independent to their top function symbols and the number of arguments) and compact (one does not need to implement the term decomposition and traversal explicitly, the declarative specification given above is sufficient).

Now, innermost and outermost rewriting strategies can be implemented by strategy combinations only, imposing the right application order of the transformation rules.

Innermost rewriting is defined by the following recursive strategy:

$$\begin{aligned}&innermost\_rewriting ( i\_ Strategy ) := \\&\quad \mathbf{{first\_all}}( transform\_arg ( innermost\_rewriting ( i\_ Strategy )), \\&\qquad transform\_top ( i\_ Strategy )). \end{aligned}$$

It gives the priority to the argument transformation by innermost rewriting (wrt the given strategy) over the top-position transformation (wrt the given strategy): If the former is applicable, \(\mathbf{{first\_all}}\) makes sure that its all possible results are returned and the latter is not tried. For instance, assume that \( str \) is some concrete strategy defined by two clauses:

$$\begin{aligned}&str \,{:}{:}\, f( s\_X ) \Longrightarrow g( s\_X ). \qquad \qquad \qquad str \,{:}{:}\, f(f( i\_X )) \Longrightarrow i\_X . \end{aligned}$$

If we ask to rewrite h(f(f(a)), f(a)) by innermost rewriting:

$$\begin{aligned} ?- \, innermost\_rewriting ( str ) \,{:}{:}\, h(f(f(a)),f(a)) \Longrightarrow i\_Result . \end{aligned}$$

P\(\rho \)Log will return two results: h(f(g(a)), f(a)) and h(f(f(a)), g(a)).

If we want to experiment with outermost rewriting, we only need to define the corresponding strategy (essentially, by changing the application order of the rules, without altering them):

$$\begin{aligned}&outermost\_rewriting ( i\_ Strategy ) := \\&\quad \mathbf{{first\_all}}( transform\_top ( i\_ Strategy ),\\&\qquad transform\_arg ( outermost\_rewriting ( i\_ Strategy ))). \end{aligned}$$

Rewriting h(f(f(a)), f(a)) by this strategy gives three results: h(g(f(a)), f(a)), h(af(a)), and h(f(f(a)), g(a)).

The definitions also clearly illustrate the difference between these two rewriting strategies.

If one wants to compute only one result, instead of all, the only change needed in this case is to replace \(\mathbf{{first\_all}}\) by \(\mathbf{{first\_one}}\) in the corresponding strategy.

This example shows some advantages of P\(\rho \)Log: compact and declarative code; capabilities of expression traversal without explicitly programming it; the ability to use clauses in a flexible order with the help of strategies. Besides, P\(\rho \)Log has access to the whole infrastructure of its underlying Prolog system. These features make P\(\rho \)Log suitable for nondeterministic computations, implementing rule-based algorithms and their control, manipulating XML documents, etc.

As future work, one direction is finishing the implementation of P\(\rho \)Log trace. We also plan to improve the compiler by adding more optimization capabilities.