Keywords

1 Introduction

Genetic Programming (GP)  [28] has been successfully applied in different areas, including bioinformatics  [13], quantum computing  [35], and supervised machine learning  [19]. One of the main challenges of applying GP to real-world problems, such as program synthesis, is the efficient exploration of the vast search space. Frequently, domain knowledge can be used to restrict the search space, making the exploration more efficient. Strongly Typed Genetic Programming (STGP)  [24] restricts the search space by ignoring candidates that do not type check. To improve its expressive power, STGP has been extended with type inheritance  [10], polymorphism  [37] and a Hindley-Milner inspired type system  [22], the basis for those in Haskell, SML, OCaml or F\(\sharp \).

Grammar-Guided Genetic Programming (GGGP)  [21] also restricts the search space, sometimes enforcing the same rules as STGP, only allowing the generation of individuals that follow a given grammar. Grammar-based approaches have also been developing towards restricting the search space. The initial proposal [21] used context-free grammars (CFG) in the Backus Normal Form. The GAUGE [31] system relies on attribute grammars to restrict the phenotype translation from a sequence of integers. Christiansen grammars  [6, 33] can express more restrictions than CFGs, but still have limitations, such as variable scoping, polymorphism or recursive declarations.

We propose Refinement Typed Genetic Programming (RTGP) as a more robust version of STGP through the use of a type system with refinements and dependent types. Languages with these features have gained focus in the Programming Languages (PL) research area: LiquidHaskell [36] is an extension of Haskell that supports refinements; Agda  [2] and Idris  [3] are dependently-typed languages that are frequently used as theorem provers. These languages support the encoding of specifications within the type system. Previously, special constructs were required to add specification verification within the source code. This idea was introduced in Eiffel  [23] and applied later to Java with the JML specification language  [18].

In particular, our major contributions are:

  • A GP approach that relies on a simple grammar combined with a dependent refined type system, with the argument that this approach is more expressive than existing approaches;

  • Concretisation of this approach in the Æon Programming Language.

These contributions advance GP through a new interface in which to define representations and assess their success. One particular field where our approach might have a direct impact is general program synthesis. We identify two difficulties in the literature  [15]: a) the large search space that results from the combination of language operators, grammar and available functions, and b) the lack of a continuous fitness function. We address both aspects within the same programming language.

In the remainder of the current paper we present: the Æon language for expressing GP problems (Sect. 2); a method for extracting fitness functions from Æon programs (Sect. 3); the Refined Typed Genetic Programming approach (Sect. 4); examples of RTGP (Sect. 5); a comparison with other approaches, from a usability point of view (Sect. 6); and concluding remarks (Sect. 7).

2 The Æon Programming Language

We introduce the Æon programming language as an example of a language with polymorphism and non-liquid refinements. This language can be used as the basis for RTGP due to its support of static verification of polymorphism and a subset of the refinements. However, RTGP is not restricted to this language and could be applied to other languages that have similar type systems.

Listing 1.1 presents a simple example in Æon. To keep Æon a pure language, several low-level details are implemented in a host language, which Æon can interact with using the construct. The function is an example of a function whose definition is done in the native language of the interpreterFootnote 1.

figure d

What distinguishes Æon from strongly typed mainstream languages like C or Java is that types can have refinements that express restrictions over the types. For instance, the refinements on specify that the second argument must be greater than the first, and the output array has size equal to their different.

The call in the function throws a compile error because i is an Integer, and there are integers that are not greater than 0 (the first argument). The i argument should have been of type i:Int | i > 0. However, there is another refinement being violated because if \(i=1\), the size of the output will be 2 and not 10 as expected. The correct input type should have been for the function to compile.

It should now be clear how a language like Æon can be used to express domain-knowledge in GP problems. A traditional STGP solution would accept any integer value as the argument for range and would result in a runtime-error that would be penalized in the fitness function. Individual repair is not trivial to implement without resorting to symbolic execution, which is more computationally intensive than the static verification applied here.

The function, while very basic, is a simple example of the definition of a search problem. The function receives any integer (as there are no restrictions) and returns an integer greater than the one received and whose Fibonacci number is divisible by 100. A placeholder hole (\(\blacksquare \)) is left as the implementation of this function (inspired by Haskell’s and Agda’s holes [8]). The placeholder allows the program to parse, typecheck, but not executeFootnote 2. Typechecking is required to describe the search problem: Acceptable solutions are those that inhabit the type of the hole, . This is an example of a dependent refined type as the type of r depends on the value of n in the context. This approach of allowing the user to define a structure and let the search fill in the details has been used with success in sketch and SMT-based approaches  [34].

While the Æon language does not make any distinction, there are two classes of refinements for the purpose of RTGP: liquid and non-liquid refinements. Liquid refinements are those whose satisfiability can be statically verified, usually through the means of an SMT solver. One such example is SMT solvers can solve this kind of linear arithmetic problems. Another example is because is the same as size(x) where size is an uninterpreted function in SMT solving.

Non-liquid refinements are those that SMT solvers are not able to reason about. These are typically not allowed in languages like LiquidHaskell [36]. One example is the second refinement of function, because the verification of correctness requires the execution of the function, which can only be called during runtime  [7], typically for runtime verification. Another example of a non-liquid refinement would be the use of any natively defined function because the SMT cannot be sure of its behaviour other than the liquid refinement expressed in its type. For instance, when considering a native function that makes an HTTP request, an SMT solver cannot guess statically what kind of reply the server would send.

3 Refinements in GP

Now that we have addressed the difference between liquid and non-liquid refined types, we will see how both are used in the RTGP process. Figure 1 presents an overview of the data flow of the evolutionary process, starting from the problem formulation in Æon and ending in the solution found, also in Æon. The architecture identifies compiler components and the Æon code that is either generated or manipulated by those components.

Fig. 1.
figure 1

Architecture of the proposed approach.

3.1 Liquid Refinements for Constraining the Search Space

Liquid Refinements, the ones supported by Liquid Types  [30], are conjunctions of statically verifiable logical predicates of data. We define all other refinements as non-liquid refinements. An example of a liquid type is , where \(x > 3\) and \(x < 7\) are liquid refinements.

In our approach, liquid refinements are used to constraint the generation of candidate programs. Through the usage of a type-checker that supports Liquid Types, we are preventing candidates that are known to be invalid from being generated in the first place. However, the use of a type-checker is not ideal, as several invalid candidates might be generated before one that meets the liquid refinement is found. Synquid  [29] is a first step in improving the performance of liquid type synthesis. Synquid uses an enumerative approach and an SMT-solver to synthesize programs from a Liquid Type. However, Synquid has several limitations for this purpose: it is unable to synthesize simple refinements related to numerical values (such as the previous example), it is deterministic since it uses an enumerative approach, and while it is presented as complete with regards to the semantic values (can generate all programs that can be expressed in the subset of supported Liquid Types), it it not complete with regards to the syntactic expression. As an example, Synquid is able to synthesize 2 as an integer, but not 1+1, since the semantic values are equivalent. For the purposes of GP, not being able to synthesize more complex representations of the same program prevents recombination and mutation from exploring small alternatives. We are in the process of formalizing an algorithm that is more complete than Synquid for this purpose.

The RTGP algorithm (Sect. 4) uses the liquid type synthesis algorithm for:

  • Generating random input arguments in fitness evaluation (Sect. 3.2);

  • Generating a random individual in the initial population;

  • Generating a subtree in the mutation operator;

  • Generating a subtree in the recombination operator, when the other parent does not have a compatible node.

3.2 Non-liquid Refinements to Express Fitness Functions

A good fitness function is ideally continuous and should be able to measure how close to the real solution a potential solution is. However, fulfiling a given specification is a boolean criterion: it either is fulfilled or not. While previous work  [15] has used the number of passed tests as the fitness function, we aim to have a more fine-grained measurement of how far each test is from passing. In particular, we consider the overall error as the fitness value, and the search as a minimization problem.

We propose the use of non-liquid refinement types to synthesize a continuous fitness criteria from the specification (depicted in Fig. 1) that, together with randomly generated input values, is used to obtain the fitness function.

figure s

Listing 1.2 shows an example with a liquid refinement (\(r > n\)) and two non-liquid clauses in the refinement. Each of these two clauses is handled individually as in a multi-objective problem. Each clause is first reduced to the conjunctive normal form (CNF) and then converted from a predicate into a continuous function that, given the input and expected output, returns a floating point number between 0.0 and 1.0, where the value represents the error. For instance, the example in Listing 1.2 is converted to two functions:

figure t

Table 1 shows the conversion rules between boolean expressions and corresponding continuous values. The function f, which is defined using these rules, is applied recursively until a final continuous expression is generated. This approach is an extension of that presented in [12].

Table 1. Conversion function f between boolean expressions and continuous values.

Since the output of f is an error, the value \(\varvec{\mathsf {true}}\) is converted to 0.0, stating that condition holds, otherwise 1.0, this being the maximum value of not complying with the condition. Variables and function calls are also converted to 0.0 and 1.0 on whether the condition holds or not. Equalities of numeric values are converted into the normalized absolute difference between the arguments. The normalization is required as it allows different clauses to have the same importance on the given specification. Inequalities are converted to equalities and its difference with 1, negating the fitness result from equality. Conjunctions are converted to the average of the sum of the fitness extraction of both operands. Disjunctions value is obtained by extracting the minimum fitness value of both clauses. The minimum value indicates what clause is the closest to no error. Conditional statements fitness is recursively extracted by using the material implication rule. Similarly to inequalities, the negation of conditions denies the value returned by the truth of the condition. Numeric value comparisons represented a harder challenge as there are intervals where the condition holds. We use the difference of values to represent the error. In the < and > rules, the \(\delta \) constant depends on the type of the numerical value, 1.0 for integers and 0.00001 for doubles, and is essential for the extra step required for the condition to hold its truth value. A rectifier linear unit was used to ensure that if the condition holds, it is set to the maximum between the negative number and 0, otherwise, if the value is greater than 0, the positive fitness value is normalized.

The fitness function is the result of applying each \(f_i\) for each non-liquid refinement to a set of randomly generated (using the liquid synthesis algorithm in Sect. 3.1) input values. The fitness of an individual is the combination of all \(f_i\) for all random input values.

4 The RTGP Algorithm

The proposed RTGP algorithm follows the classical STGP  [24] in its structure but differs in the details. Just like in all GP approaches, multiple variants can be obtained by changing or swapping some of the components presented here.

4.1 Representation

RTGP can have either a bitstream representation (e.g., [31]) or a direct representation (e.g., [24]). For the sake of simplicity, let us consider the direct representation in the remainder of the paper.

4.2 Initialization Procedure

To generate random individuals, the algorithm mentioned in Sect. 3.1 is used with the context and type of the \(\blacksquare \) as arguments. This is repeated until the population has the desired size. Koza proposed the combination of full and grow as ramped-half-and-half [14], which is used in classical STGP. In RTGP, the full method is not always possible, since no valid expression with that given depth may exist in the language. If, for instance, we want an expression of type X and the only function that returns X is the constructor without any parameters. In this case, it is impossible to have any expression of type X with d greater than 1. Unlike in the STGP full method, a tree is used in the initial population, even if it does not have the predetermined depth.

4.3 Evaluation

The goal of the search problem is the minimization of the error between the observed and the expressed specification. Non-liquid refinements are translated to multi-objective criteria (following the approach explained in Sect. 3.2). The input values are randomly generated at each generation to prevent overfitting  [9]. A fitness of 0.0 for one clause represents that all sets of inputs have passed that condition successfully. The overall objective of the candidate is to obtain a 0.0 fitness in all clauses.

4.4 Selection and Genetic Operators

Recent work has provided significant insights on parent selection in program synthesis  [11]. A variant of lexicase selection, dynamic \(\epsilon \)-Lexicase  [17] selection, has been used to allow near-elite individuals to be chosen in continuous search spaces.

The mutation operator chooses a random node from the candidate tree. A replacement is randomly generated by providing the node type to the expression synthesis algorithm along with the current node depth, fulfiling the maximum tree depth requirement. The valid subtrees of the replaced node are provided as genetic material to the synthesizer, allowing partial mutations on the candidate.

The crossover operator selects two random parents using the dynamic \(\epsilon \)-lexicase selection algorithm. A random node is chosen from the first parent, and nodes with the same type from the second parent are selected for transplantation into the first parent. If no compatible nodes are found, the expression synthesizer is invoked using the second parent valid subtrees, and the remaining first parent subtrees as genetic material. This is similar to how STGP operates, with the distinction that subtyping in Liquid Types refers to the implication of semantic properties. Thus, unsafe recombinations and mutations will never occur.

4.5 Stopping Criteria

The algorithm iterates over generations of the population until one or multiple of the following criteria are met: a) there is an individual of fitness 0.0; b) a predefined number of generations have been iterated; c) a predefined time duration has passed.

5 Examples of RTGP

This section introduces three examples from the literature implemented in Æon.

5.1 Santa Fe Ant Trail

The Santa Fe Ant Trail problem is frequently used as a benchmark for GP. In [26], the authors propose a grammar-based approach to solve this problem. In RTGP, if-then-else conditions and auxiliary functions (via lambda abstraction) are embedded in the language, making this a very readable program.

figure u

5.2 Super Mario Bros Level Design

The second example defines the search for an interesting design for a Super Mario Bros level that maximizes the engagement, minimizes frustration and maximizes challenge. These functions are defined according to a model that can easily be implemented in Æon (Listing 1.4). We present this as a more usable alternative to the one that uses GGGP  [32].

figure v

Compared with the proposed grammar  [32], the complexity is similar and productions in either version are directly correspondent. The Æon version is arguably more expressive because the combinations of repetitions of objects with minimum and maximum number of repetitions can be bounded using types (enemies and boxes).

5.3 Logical Gates

The third example is taken from [27], where the goal is to “given any logical function, find a logically equivalent symbolic expression that uses only the operators in one of the three following complete sets: and, or, not, nand, nor". The authors propose a Christiansen grammar, which is context-sensitive, to express this problem. Listing 1.5 presents a more simple implementation of the problem using Æon. It can be argued that the implementation using refinements comes more directly from the problem statement than the complex dynamic grammar used in [27]. Furthermore, the implementation of the operations can be done directly in the same language.

figure w

6 Discussion

This section compares RTGP with GGGP and presents arguments why RTGP could be used instead of GGGP. Because Dependent Types can encode grammars  [5], the performance of both approaches is equivalent.

6.1 A Direct Comparison with GGGP

A survey on GGGP  [21] identified the advantages and disadvantages of GGGP. We compare with RTGP on the advantages:

  • Ability to declaratively restrict the search space—A type system is used instead of a grammar to express the restriction.

  • Problem Structure—Problem domains that already follow a grammar structure can be easily encoded in RTGP. RTGP can more directly encode several problems than a grammar. Two examples are General-purpose programming and the Logical Gates problem (Sect. 5.3).

  • Homologous Operators—Both GGGP and RTGP restrict the replacement of one component by another of similar close values.

  • Flexible Extension—Extensions to GP can be encoded both in grammars and dependent types. Both approaches can be used as engines to test other GP concepts.

And disadvantages:

  • Feasibility Constraints—Both GGGP and RTGP make the design of new operators a more significant challenge than in STGP, given that operators should follow the constraints imposed by the system. All RTGP operators are shared among any problem and rely solely on two algorithms: the type checker and expression synthesis.

  • Repair Mechanisms—Implementing repairing in GGGP often depends on the grammar. RTGP relies on n expression synthesis algorithm (Sect. 3.1) that generates individuals in a way that constraints are never violated. The same has been done for the mutation and crossover operators. However, a repair mechanism is straightforward: the type-checker identifies the malign node, and expression synthesis generates a replacement.

  • Limited Flexibility—GGGP is flexible when the program can be directly encoded in a context-free grammar. Some GGGP approaches use context-sensitive grammars (CSGP)  [27], but readability can become a problem (explained in Sect. 6.2).

  • Turing Incompleteness—GGGP supports grammars with semantics that allow the encoding of Turing-complete and incomplete problems. As such, GGGP does not offer any additional support for computation paradigms such as recursion and iteration, like other GP systems. RTGP supports both recursion and iteration directly, unless otherwise specified.

6.2 Usability

Instead, the main argument for RTGP over GGGP is one that concerns with usability. First, RTGP provides an integrated environment for describing the context, the problem, the search space and the solutions. Taking Æon as an example of RTGP, the environment in which the final program will execute can be defined, relying on native functions to use software written in other programming languages. The problem is defined using refined types for the goal of the system, and a hole marker (\(\blacksquare \)) is left as a placeholder for the program we are looking for. The search space is defined by the types used in the problem definition. Finally, the solution is a program in the same language as everything else, so it is ready to execute (and be evaluated).

On the other hand, if one were to use GGGP, one would have to create each of these components individually. The lack of a de-facto standard framework for GGGP helps this argument, in which interfacing with the context can be more complicated than implementing GGGP itself. GGGP concerns only with the description of the search space, while RTGP provides an integrated view of using GP.

The strongest point for RTGP is that it does not require the user to define a grammar. Just by placing holes in a program, users can use RTGP without even knowing how to define a grammar. Instead, they need to know how to use a familiar programming language (which to implement GGGP is already required) and to know how to express desired properties in refined types. While refined types have not yet become mainstream, several languages have feature subsets of its features for a long time. Eiffel  [23] supported pre- and post-conditions since 1986. Ada is another language that supports design by contract  [4], and it is very popular for critical embedded development, being used for large projects in air traffic control  [16] with more than 1 million lines of code. Advanced type systems have become more popular to prevent bugs from existing in codebases. Mozilla created Rust to avoid concurrency issues in the Firefox browser  [20], and Microsoft is using PL and SMT-based techniques to verify low-level critical components of the kernel and drivers  [1].

7 Conclusions and Future Work

We have presented Refinement Typed Genetic Programming (RTGP) as an approach to describe search problems in an integrated programming language. We have introduced a language, Æon, capable of expressing the environment, the fitness function, the search space, and the solution. The language features an advanced type system with liquid and non-liquid types. We have provided a methodology to generate the fitness function from non-liquid refined types, and we have introduced an algorithm that generates expressions from any inhabitable type in this language.

In Sect. 6 we have compared RTGP against GGGP, concluding that they are equivalent in expressiveness. However, we argue that RTGP provides better usability for end-users than GGGP, in which all aspects of the evolution have to be implemented. Furthermore, expressing restrictions in types allows more modular programs and better readability inside an integrated experience for defining and using RTGP.

There are still some aspects to explore with regards to RTGP: identifying the most efficient representation; improving the liquid type synthesis; finding the best representation for non-functional properties of programs; how to integrate this synthesis in an integrated editor; and to perform a exhaustive benchmark performance analysis.