Keywords

1 Introduction

Language-oriented programming [8, 14, 31] is a paradigm that has received a lot of attention in recent years. Behind this paradigm is the idea that different parts of a programming solution should be expressed with different problem-specific languages. For example, programmers can write JavaScript code for their web application and enjoy using JQuery to access DOM objects in some parts of their code, and WebPPL to do probabilistic programming in other parts [16]. To realize this vision, language workbenches have emerged as sophisticated tools to assist programmers with the creation, reuse and composition of languages.

Languages as first-class citizens [7] is an approach to language-oriented programming that advocates that language definitions should have the same status as any other expression in the context of a general-purpose programming language. In this approach language definitions are run-time values, just like integers, for example, and they can be the result of computations, bound to variables, passed to and returned by functions, and inserted into lists, to name a few possibilities.

  [6, 7] is a functional language-oriented programming language with languages as first-class citizens.   is implemented in a combination of OCaml and the higher-order logic programming language \(\lambda \)Prolog [24]. The core of the language implementation is, however, an interpreter written in \(\lambda \)Prolog. Specifically, programs are compiled into \(\lambda \)Prolog terms and executed with such interpreter.

To implement language-oriented programming operations, the features of higher-order logic programming have proved to be exceptionally fitting. In particular, formulae as first-class citizens make it easy to have a run-time data type for languages and implement operations that manipulate languages at run-time, including switching evaluation strategies on demand (call-by-value vs call-by-name). Furthermore, hypothetical reasoning [15] (i.e. deriving implicative goals)Footnote 1 makes it easy to execute programs with arbitrary languages defined by programmers, as well as switch from a language to another during computation.

Goal of the Paper. Our goal is to demonstrate that higher-order logic programming can be a great fit for implementing language-oriented systems. To this aim, we describe our compilation methods and highlight how the distinctive features of higher-order logic programming have been a natural fit in this context.

Ultimately, allows for non-trivial language-oriented programming scenarios, and yet its interpreter is 73 lines of \(\lambda \)Prolog code. This is remarkable in the context of language-oriented systems.

Roadmap of the Paper. Section 2 reviews higher-order logic programming as adopted in \(\lambda \)Prolog. Section 3 gives a general overview of the implementation of before diving into specific aspects. Section 4 discusses our implementation of (programmer-defined) language definitions. Section 5 covers our implementation of the operations that manipulate languages. Section 6 provides details on our implementation w.r.t. using languages to execute programs. Section 7 covers the scenario of switching strategies at run-time. Section 8 discusses related work, and Sect. 9 concludes the paper.

2 Higher-Order Logic Programming

\(\lambda \)Prolog is a flagship representative of higher-order logic programming languages [24]. This section reviews its features. We do not give a complete account of \(\lambda \)Prolog. Instead, we discuss the features that play a role in the next sections. \(\lambda \)Prolog extends Prolog with the following elements: types, formulae as first-class citizens, higher-order abstract syntax, and hypothetical reasoning.

Typed Logic Programming. \(\lambda \)Prolog programs are equipped with a signature that defines the entities that are involved in the program. Programs must follow the typing discipline that is declared in this signature or they would be rejected. For example, if we were to implement a simple language with numbers and additions, we would have the following declarations.

figure i

The keyword \(\mathtt {kind}\) declares the entities in the program. The keyword \(\mathtt {type}\) is used to specify how to create terms of such entities.

As in Prolog, the computation takes place through the means of logic programming rules. Logic programming rules can derive formulae, which are built with predicates. Predicates, as well, must be declared with a type in \(\lambda \)Prolog. For example, a type checking relation and a reduction relation can be declared as follows.

figure j

The keyword \(\mathtt {prop}\) denotes that \(\mathtt {typeOf}\) and \(\mathtt {step}\) build a formula when applied to the correct type of arguments.

Formulae as First-Class Citizens. \(\lambda \)Prolog extends Prolog also in that formulae can be used in any well-typed context. For example, below we intentionally split our type checker in an unusual way.

figure k

The predicate \(\mathtt {getFormula}\) takes two arguments. The first is an expression and is an input, and the second is a proposition and is an output. The predicate \(\mathtt {getFormula}\) returns the formula we should check to establish that the term is well-typed (the output type is ignored for the sake of this example). This example shows that formulae can be arguments. Furthermore, after \(\mathtt {check}\) calls \(\mathtt {getFormula}\) to retrieve the formula \(\mathtt {F}\), this formula can be used as a premise in the rule, as shown in the last line.

Higher-Order Abstract Syntax (HOAS). HOAS is an approach to syntax in which the underlying logic can appeal to a native \(\lambda \)-calculus for modeling aspects related to binding [25]. Suppose that we were to add the operators of the \(\lambda \)-calculus, we would define the following.

figure l

The argument of \(\mathtt {abs}\) is an abstraction from an expression to an expression. To model the identity function we write , where the highlighted part of this term points out the syntax used by \(\lambda \)Prolog for writing HOAS abstractions. In the beta-reduction rule above we have that \(\mathtt {R}\) is an abstraction and therefore we can use it with a HOAS application \(\mathtt {(R\;V)}\) to produce a term. \(\lambda \)Prolog takes care of performing the substitution for us.

Hypothetical Reasoning. \(\lambda \)Prolog also extends Prolog with hypothetical reasoning [24] (i.e. deriving implicative goals [15]). To appreciate this feature consider the following logic program.

figure n

The city london and portland are not connected. However, in \(\lambda \)Prolog we can write the formula:

figure o

This formula asks: “Were nyc connected to chicago, would london be connected to portland?”. At run-time the query connected london portland is interrogated in the logic program in which the fact connected nyc chicago is added.

3 Basic Overview of Lang-n-Play

  is a functional language-oriented programming language [7]. Programmers can define their own languages and use them to execute programs.   is implemented partly in OCaml and partly in \(\lambda \)Prolog. Precisely, the following is the architecture of .

figure s

Programs are parsed and type checked in OCaml. These two aspects are not discussed in the paper because they do not play a role in our message about the effectiveness of higher-order logic programming.

Next, programs are compiled into \(\lambda \)Prolog. The interpreter of programs is a \(\lambda \)Prolog logic program. Our OCaml compilation produces the \(\lambda \)Prolog term that represents the program, and gives it to this interpreter. Programmers do not launch the interpreter manually. Our OCaml code is interfaced to the ELPI interpreter of \(\lambda \)Prolog through the ELPI OCaml package [9], and loads an external file that contains the interpreter written in \(\lambda \)Prolog.

In this paper we discuss the part that is enclosed in the red rectangle, as it is the core of the implementation of , and highlights how higher-order logic programming can accommodate the features of .

Our interpreter defines a kind for expressions called \(\mathtt {expLO}\), and defines two relations for reduction steps and detecting values.

figure z

The suffix \(\mathtt {LO}\) in the names of operators and relations are meant to recall that we are in a language-oriented language. We introduce the elements of \(\mathtt {expLO}\) as we encounter them in the next sections. There are a number of aspects of that we do not discuss in this paper. For example, although includes common features of functional programming such as booleans, if-then-else, lists, \(\mathtt {letrec}\), and \(\mathtt {import}\), we omit discussing them because they are not relevant to this paper.

4 Defining Languages

  provides syntax for defining languages. Below is the example of a definition of a language with lists and an operator elementAt which accesses an element of a list based on its position (position 0 is the first element).

figure ad

Languages are defined within {! ... !}, as in lines 1–22.   makes use of a domain-specific language that closely resembles the way researchers define and share languages in operational semantics. (The Ott language achieved the same effect over ten years ago [28], and we adopt a similar syntax).

As typical, languages define a grammar (lines 2–10) and an inference system (lines 12–21). An inference system may define a type system and a reduction semantics.

The syntax for defining grammars is quite standard. As in many language workbenches [8, 14, 31], language definitions can use lists with [ ... ]. For example, Environment Gamma  :: = [x : T] means Gamma is a list of formulae of that shape.   provides the operation in for testing membership on lists, as in line 12. Furthermore, there are two special grammar categories, Relation and Starting Call. The former simply declares relations, and the latter informs on how to call the type checker and the evaluator.

The syntax for defining inference systems is also rather familiar to operational semantics practitioners. Perhaps, the biggest departure is that the horizontal line of an inference rule is replaced with an inverse implication that can be read “provided that”, and we use an explicit syntactic and-operator when we have multiple premises.

The language definition above serves as our running example throughout the paper. We refer to the lines 2–21 as listLines in what follows.

4.1 \(\lambda \)Prolog Implementation of Language Definitions

It is rather natural to compile language definitions such as the one above into \(\lambda \)Prolog because operational semantics is based on inference systems. These, in turn, map naturally to logic programming rules. For example, the language above compiles to the following (we show only an excerpt).

figure ai

The fact that language definitions map well to higher-order logic programs has been previously demonstrated with the work of Twelf [26], and \(\lambda \)Prolog [24].

Since handles programmer-defined languages that may be manipulated at run-time, we accommodate languages with an internal representation. In particular, the language above is represented with a list of formulae with the following operator:

figure ak

Therefore, the language listLines is compiled as follows.

figure al

This is our run-time representation for languages. Notice that since we need only the reduction rules to execute programs we compile inference rules only (not grammar), with the exception of values and evaluation contexts, which are turned into rules.

We shall refer to the list of elements at lines 3–16 as listsInLP.

5 Operations on Languages

  provides a handful of operations on language definitions. Below we discuss the following operations: let-binding, union of languages, functions on languages, and removal of rules.

Let-Binding.   can bind a language definition to a variable in typical ML-style, as in

figure ao

Therefore, our interpreter includes a let-operation and its reduction semantics.

figure ap

The code above is then compiled to

figure aq

which reduces to in one step. (This example also shows that languages can be the result of computations.)

Language Union. Another operation of is language union, performed with the binary operator \(\mathtt {U}\). For example, notice that the language for lists is unsafe: if we called elementAt asking for the 4-th element of a list that contains only 2 elements we would get stuck, because there is no specified behavior for that case. We can add reduction rules that cover those cases as follows. (We refer to the language created by the union operation below as \(\mathtt {safeLists}\)).

figure at

This code adds the error to the language and adds appropriate reduction rules. Our interpreter includes the language union operation and its reduction semantics.

figure au

where \(\mathtt {append}\) is a polymorphic list append defined in the interpreter (straightforward and here omitted). The union operation above is compiled as

figure av

which reduces to in one step.

Removal of Rules.   includes an operation for removing rules from a language. For example, the union above adds two extra rules but the sole rule step (elementAt V nil) myError :- value V would be sufficient.

To modify \(\mathtt {safeLists}\) and create the more compact language that has only one rule we can execute the following program.

figure az

The operation \(\mathtt {remove}\) takes in input a rule and a language, and returns a language. This code removes one of the rules from \(\mathtt {safeLists}\) at lines 3 and 4. The language so produced is then used in the removal of the other rule at lines 1–3. Line 5 adds the safe reduction rule for \(\mathtt {elementAt}\) with a union operation.

Our interpreter includes the rule removal operation and its reduction semantics.

figure ba

where \(\mathtt {listRemove}\) is a polymorphic predicate that matches elements of a list with a given element and removes the element if the match succeeds. This predicate is also defined in the interpreter (straightforward and here omitted).

The remove operations above, excluding the union at line 5, are compiled as

figure bb

which reduces to in two steps.

\(\lambda \)Prolog grants us a powerful equality on formulae and the removal operation is far from performing a textual match on the rule to remove. For example, the following two rules are equal in \(\lambda \)Prolog.

$$\begin{array}{c} (*)\qquad \texttt {step (elementAt (succ MyVar) nil) myError :- value MyVar }\\ = \\ \texttt { step (elementAt (succ V) nil) myError :- value V } \end{array} $$

Therefore, we obtain the same results if we used the formula (*) at line 4. Thanks to \(\lambda \)Prolog, formulae are up-to renaming of variables and alpha-equivalence of HOAS abstractions.

Functions on Languages. As typical in programming languages, we often would like to pack instructions in functions for the sake of abstraction.   provides functions on languages. For example, instead of performing language union inline we can create a function that adds the desired safety checks, as in

figure be

Our interpreter includes abstractions and applications, as well as their reduction semantics.

figure bf

  compiles the let-binding above in the following way.

figure bh

This program reduces the \(\mathtt {letLO}\) operation in one step and we obtain

figure bi

In turn, this program reduces in one step to

figure bj

which produces the expected language in one step.

6 Executing Programs and Language Switch

Of course, languages can be used to execute programs. The code below shows the expression to do so. (For readability, we use standard notation for numbers and lists rather than sequences of \(\mathtt {succ}\) and \(\mathtt {cons}\)).

figure bl

We call this type of expression program execution and is of the form language  >  program. The program above returns a value together with the language with which it has been computed:

.

Our interpreter includes the operation for executing programs and its reduction semantics.

figure bn

In the declaration at the top, \(\mathtt {program}\) is the kind for programs. Intuitively, elements of \(\mathtt {program}\) are S-expressions, i.e. a top-level operator followed by a series of arguments, which, too, can be programs. Notice that the language argument of \(\mathtt {execLO}\) (first argument) is an expression. Although above we have explicitly written the language to be used, that component can be an expression that evaluates to a language, for example as in

figure bo

The reduction rule for \(\mathtt {execLO}\) deserves some words. The key idea is that we use hypothetical reasoning. In Sect. 2 we have seen that we can use this feature to temporarily add facts and run an augmented logic program. Above, instead, we do not add a fact but a list of formulae \(\mathtt {Language}\). Moreover, this list of formulae is not a list of facts but a list of rules (rules such as step (elementAt zero (cons V1 V2)) V1 :- value V1, value V2.). This has the effect of augmenting the logic program that we are currently running (which is our interpreter!) with new rules. In particular, these rules define the operational semantics of the language that we need to execute. The interpreter then interrogates \(\mathtt {(step\;Prg\;Prg')}\) to compute the step from these new rulesFootnote 2.

For example, the code above compiles to

figure br

The current logic program (that is, our interpreter) is augmented with the rules listsInLP and we execute the query

\(\mathtt {(step\;\;(elementAt \;1\;[1,2,3])\;\;Prg')}\).

This produces . (Notice that the query asks for one step).   keeps executing until a value is produced. Therefore at the second step we run the query \(\mathtt {(step\;\;(elementAt \;0\;[2,3])\;\;Prg')}\). This query returns the result , which our interpreter detects as a value. The execution of programmer-defined languages and the execution of operations are never confused because the former makes use of the predicate \(\mathtt {step}\) and the latter makes use of the predicate \(\mathtt {stepLO}\).

The way our interpreter recognizes that we have obtained a value is through the predicate \(\mathtt {valueLO}\), which our interpreter defines with

figure bx

Let us recall that once listsInLP is loaded in the current logic program it also contains the rules that define the values of the loaded language, defined with the predicate \(\mathtt {value}\). These rules are, for example, and \(\mathtt {value \;succ\;V \;:- \;value \;V.}\), and so on. Then we run the query to detect if \(\mathtt {Prg}\) is a term that the loaded language defines as a value.

Language Switch.   also allows for switching languages at run-time. Consider the following program.

figure cb

This code defines a language with pairs. Afterwards, it makes use of the list language to perform a list access. However, the first argument of \(\mathtt {elementAt}\) (the position) is computed by executing another program in another language. In particular, the position is computed by executing fst (pair 1 0) in the language \(\mathtt {pairs}\).

This program returns the following value (recall that position 1 asks for the second element of the list).

  .

Implementing this language switch is easy with higher-order logic programming. When we execute

figure cd

The interpreter adds the rules listsInLP and evaluates the program that starts with \(\mathtt {elementAt}\). When the interpreter encounters another \(\mathtt {execLO}\) it applies the same reduction rule of \(\mathtt {execLO}\) that we have seen. This has the effect of adding the language with pairs on top of the language with lists. This augmented language is then used to evaluate fst (pair 1 0) with the query (step (fst (pair 1 0)) Prg’). The nested \(\mathtt {execLO}\) detects when fst (pair 1 0) evaluates to a value in the way described above, that is, the query (value 1) succeeds. At this point, \(\mathtt {execLO}\) simply leaves the value it has just computed in the context in which it has been executed. Therefore, \(\mathtt {elementAt}\) simply continues with the value 1 as first argument, oblivious as to how this value has been computed.

Remark on the Semantics of Language Switch. The semantics of language switch is such that the current language is extended with new rules. Therefore, the switch does not replace a language with a completely different language. The semantics we adopt is in 1-1 correspondence with the semantics of hypothetical reasoning. We believe that this facilitates writing correct programs because the child language must at least share some values with the parent language, as the nested computation leaves a value in the context of the parent. Therefore, this value must be understood by the parent language.

Notice that the overall result is 2 in the language that does not contain pairs. Indeed, \(\mathtt {pairs}\) has been added and then removed by \(\lambda \)Prolog after the nested \(\mathtt {execLO}\) has finished.

7 Valuehood Abstractions

Strategies play a central role in computing. Examples of notable strategies are call-by-value and call-by-name in the \(\lambda \)-calculus. In we can define the \(\lambda \)-calculus in a way that allows the strategy to be chosen at run-time. We do so with valuehood abstractions:

figure cg

Here, \(\mathtt {lambda}\) is not a language definition. It is, instead, a valuehood abstraction. This is a function that takes in input a kind of expression called \(\mathtt {strategy}\). The variable \(\mathtt {vh}\) is bound in the body of \(\mathtt {lambda}\) and can appear as a variable in inference rules, as highlighted. The meaning is that it will be discovered later whether the inference rule fires when the variable is a value or an ordinary expression. The two strategies are represented with the constants \(\mathtt {EE}\) and \(\mathtt {VV}\), respectively. The application returns the language with the reduction rule (app (abs @x e) e2) –> e[e2/x], which fires irrespective of whether \(\mathtt {e2}\) is a value or not, in call-by-name style. The application \(\mathtt {(lambda \;VV)}\), instead, return the language with rule (app (abs @x e) v) –> e[v/x], which fires when the argument is a value, in call-by-value style.

To realize this, we take advantage of formulae as first-class citizens. The compilation of \(\mathtt {lambda}\) is:

figure ci

The variable \(\mathtt {vh}\) is a HOAS abstraction from terms to formulae. Intuitively, when we pass \(\mathtt {EE}\) we want to say that the premise (vh ARG) should not add any additional conditions. We do so by compiling \(\mathtt {EE}\) to the function . The application \(\mathtt {(lambda \;EE)}\) is then compiled into an ordinary application appLO.

For \(\mathtt {(lambda \;EE)}\), after the parameter passing we end up with

 

which \(\lambda \)Prolog converts to

language [step (app (abs R) ARG) (R ARG) :- true]

When we pass \(\mathtt {VV}\), instead, we want to say that the premise (vh ARG) should be satisfied only so long that ARG is a value. To do so, we compile the constant \(\mathtt {VV}\) to . When we execute \(\mathtt {(lambda \;VV)}\) we end up with

 

which \(\lambda \)Prolog converts to

 

Therefore, we place a new premise to check that \(\mathtt {ARG}\) is a value.

8 Related Work

The main related work is the vision paper [7]. We have used a variant of the example in [7] to demonstrate our compilation methods. We have also addressed remove with a different example, and the example on language switch is new.

There are two main differences between [7] and this paper.

  •   [7] proposes the approach of languages as first-class citizens and exemplifies it with an example in . [7] does not discuss any implementation details of . On the contrary, this paper’s focus is entirely on the implementation mechanisms that we have adopted and, most importantly, this paper demonstrates that the distinctive features of higher-order logic programming are a great fit for language-oriented systems.

  • This paper extends the work in [7] by adding language switches.

The K framework is a rewrite-based executable framework for the specification of programming languages [27]. Differently from , the K framework does not offer language-oriented programming features such as language switch and the updating of languages at run-time. On the other hand, K has been used to define real-world programming languages such as C [10] and Java [2], to name a few, while has not. The style of language definitions in is that of plain operational semantics. Therefore, defining real-world programming languages can become a cumbersome task, and we have not ventured into that task yet. An interesting feature of the K framework is that it automatically derives analyzers and verifiers from language specifications [29]. This work inspires us towards adding similar features to .

A number of systems have been created to support language-oriented programming [3, 17, 18, 21, 30]. Features such as language switch and the updating of languages at run-time are rather sophisticated in this research area, and not many systems offer these features [4, 12, 19, 20]. However, some language workbenches do provide these functionalities, such as Racket [13], Neverlang [30] and Spoofax [18]. Racket provides special syntax for defining languages and its implementation is based on macros. Language definitions are macro-expanded into Racket’s core functional language. Spoofax and Neverlang have an internal ad hoc representation of languages. Languages can be accessed and updated at run-time in Neverlang, for example, making use of the run-time loading features of the JVM to add parts to languages represented as objects. To our knowledge, is the only language-oriented system that is implemented in higher-order logic programming.

  is not more expressive than other systems. The goal of this paper is not to provide a system that surpasses the state of the art. is also a young tool (2018) compared to the mentioned systems, some of which boast decades of active development. The goal of our paper is, instead, about the effectiveness of higher-order logic programming in language-oriented programming. It is hard to compare our implementation to others at a quantitative level (lines of code) because systems such as Racket, Neverlang and Spoofax are very large and mature systems that offer all sorts of language services. We were not able to single out an isolated meaningful part of these systems to compare with our interpreter. Nonetheless, we believe that it is generally remarkable that we could implement the interpreter of a full language-oriented programming language with sophisticated features in 73 lines of code.

9 Conclusion

This paper describes the compilation methods that we have adopted in the implementation of , and provides evidence that high-order logic programming can be a great fit for implementing language-oriented systems.

The following aspects of higher-order logic programming have been particularly fitting:

(We add numbers, as some features have been helpful in more than one way).

  • Formulae as first-class citizens #1: List of formulae naturally models language definitions in operational semantics, providing a readily available run-time data type. This makes it easy to implement operations that manipulate languages (such as union and rule removal) during execution.

  • Formulae as first-class citizens #2: It models naturally the switch of evaluation strategy at run-time. This is thanks to the fact that we can pass premises to rules. These premises may or may not add new conditions under which existing rules can fire.

  • Hypothetical reasoning #1: It naturally models the execution of a program with a given operational semantics, possibly created at run-time.

  • Hypothetical reasoning #2: It naturally models the switch from executing a program using a language to executing another program using an extension of that language.

In the future, we would like to strengthen our message by implementing further operations on languages using high-order logic programming. We would like to implement operations such as language unification and restriction [11], grammar inheritance, language embedding, and aggregation from the Manticore system [21], and renaming and remapping from Neverlang [30]. There is notable work in [5, 22, 23] on inferring the dependencies of languages, which we also plan to implement.

Some systems compile languages and programs into proof assistants. For example, Ott compiles into Coq, HOL and Isabelle [28], so that users can carry out proofs in these systems. Currently, compiles into \(\lambda \)Prolog solely to execute programs. However, a subset of \(\lambda \)Prolog is also the specification language of the proof assistant Abella [1]. In the future, we would like to explore the verification of programs after compilation to \(\lambda \)Prolog/Abella code.

We point out that language workbenches offer a variety of editor services among syntax colouring, highlighting, outlining, and reference resolution, to name a few. Currently, is not equipped with a comfortable IDE. Inspired by the work on language workbenches, we would like to improve the usability of our system.