1 Introduction

Abstract interpretation is a technique to define sound static analyses [6]. Static analyses have proved useful in providing feedback to developers (e.g., dead code [4], type information), in finding bugs (e.g., uninitialized read [25], type errors [20]), and in enabling compiler optimizations (e.g., constant propagation [3], purity analysis [21]). It is therefore no surprise that the field of abstract interpretation and static analysis has seen significant attention both in academia and industry.

Unfortunately, the analysis of program transformations has not seen much attention so far. Program transformations are a central tool in language engineering and modern software development. For example, they are used for code desugaring, macro expansion, compiler optimization, refactoring, migration scripting, or model-driven development. The development of such program transformations tends to be difficult because they act at the metalevel and should work for a large class of potential input programs. Yet, there are hardly any static analyses for program transformation languages available, and it appears to be difficult to develop such analyses. To this end, we identified the following challenges:

  • Domain-Specific Features Program transformation languages such as Stratego [24], Rascal [14], and Maude [5] aim to simplify the development of program transformations. Therefore, they provide domain-specific language features such as rich pattern-matching, backtracking, and generic traversals. These domain-specific language features usually cannot be found in other general-purpose languages and the literature on static analysis provides only little guidance on how to tackle them.

  • Term Abstraction Programs are first-class in program transformations and are represented as terms (e.g., abstract syntax trees). Therefore, analysis developers need to find a good abstraction for terms, such as syntactic sorts or grammars [8]. This term abstraction heavily influences the precision and usefulness of the analysis and most of the analysis development effort should be spent on the design of this abstraction. We expect analysis developers to experiment with alternative term abstractions: The design of good abstract domains is inherent to the development of any abstract interpreter and cannot be avoided.

  • Soundness Developing an abstract interpreter that soundly predicts the generated code of program transformations is difficult. This is because real-world transformation languages have many edge cases and an abstract interpreter has to account for all of these edge cases to be sound. Furthermore, transformation languages often do not have a formal semantics, which makes it hard to verify that the abstract interpreter covered all cases.

In this paper we present a systematic approach to develop abstract interpreters for program transformation languages that addresses these challenges. It is based on the well-founded theory of compositional soundness proofs [13] and reusable analysis components [12]. In particular, our approach captures the core semantics of a transformation language with a generic interpreter [13] that does not refer to any analysis-specific details. This simplifies the analysis of the domain-specific language features. Furthermore, our approach decouples the term abstraction from the remainder of the analysis through an interface. This means that any term abstraction that implements this interface gives rise to a complete abstract interpreter. Thus, analysis developers can fully focus on developing good term abstractions. Lastly, our approach reuses language-independent functionality, such as abstractions for environments, exceptions and fixpoints, from the Sturdy standard library. This not only reduces the analysis development effort, but also simplifies its soundness proof as we can rely on the soundness proofs of the Sturdy library [12].

We demonstrate the feasibility and usefulness of our approach by developing abstract interpreters for Stratego [24]. Stratego is a complex dynamic program transformation language featuring rich pattern matching, backtracking, generic traversals, higher-order transformations, and an untyped program representation. Despite these difficulties, based on our approach we developed three novel abstract interpreters for Stratego: We developed a constant propagation analysis, a sort analysis, which checks that transformations are well-typed, and an advanced sort analysis, which can even validate type-changing generic traversals which produce ill-sorted intermediate terms. Our systematic approach was crucial in allowing us to focus on each of these abstract domains without being concerned with other aspects of the Stratego language. We implemented the analyses in Haskell in the Sturdy analysis framework and the code of the analyses is open-source.Footnote 1

In summary, we make the following contributions:

  • We propose a systematic approach to the development of abstract interpreters for program transformations, that lets analysis developers focus on designing the term abstraction.

  • We show that many features of program transformation languages can be implemented on top of existing analysis functionality and do not require specific analysis code.

  • We demonstrate the feasibility and usefulness of our approach by applying it to Stratego, for which we develop three novel abstract interpreters.

2 Illustrating Example: Singleton Analysis

The static analysis of program transformations can have significant merit helping developers to understand and debug their code and helping compilers to optimize the code. For example, we would like to support the following analyses: Singleton analysis to enable constant propagation, purity analysis to enable function inlining, dead code analysis to discover irrelevant code, sort analysis to prevent ill-sorted terms. While these and many other analyses would be useful, their development is complicated. In this section, we illustrate our approach by developing a singleton analysis for Stratego [24].

2.1 Abstract Interpreter for Program Transformations = Generic Interpreter + Term Abstraction

The development of analyses for program transformations is complicated for two reasons. First, each analysis requires a different term abstraction, with which it represents the generated code. The choice of term abstraction is crucial since it directly influences the precision, soundness, and termination of the analysis. Second, program transformation languages provide domain-specific language features such as rich pattern matching, backtracking, and generic traversals. Soundly approximating these features in an analysis is not easy, and resolving this challenge for each analysis anew is impractical.

In this paper, we propose a more systematic approach to developing static analyses for program transformations. To support static analyses for a given transformation language, we first develop a generic interpreter that implements the abstract semantics of the domain-specific language features in terms of standard language features whose abstract semantics is well-understood already. The generic interpreter is parametric in the term abstraction, such that we can derive different static analyses in a second step by providing different term abstractions. This architecture enables analysis developers to separately tackle the challenge of designing a good term abstraction.

We have developed a generic interpreter for Stratego based on the Sturdy analysis framework [12, 13] in Haskell. We explain the full details of generic interpreters and background about Sturdy in Sect. 3. Here, we only illustrate a small part of the generic interpreter, namely pattern matching.

figure a

Listing 1 shows the generic analysis code for pattern matching. We parameterized the pattern-matching function using a type class as an interface. Pattern matching interacts with the term abstraction to deconstruct terms but implements other aspects generically. In Listing 1, we have highlighted all calls to operations of ; the remaining code is generic. We provide a short notational introduction before delving deeper into the analysis code.

Our approach is based on Sturdy, which requires analysis code to be written in arrow style [11]. Like monads, arrows generalize pure functions to support side-effects in a principled fashion. For users of our approach, this mostly means that they have to use Haskell’s built-in syntax for arrows, as shown in Listing 1. Expression introduces an arrow computation similar to the pure . Do notation denotes a sequence of arrow commands, where each command takes the form or  [18]. Command calls on and stores the result in ignores the resulting value but not the potential side-effect of . For a more in-depth introduction to arrows, we refer to Hughes’s original paper [11] and online resources such as https://www.haskell.org/arrows.

The generic analysis code for pattern matching in Listing 1 describes a computation that is parametric in and , but restricts these types through type-class constraints. Type must implement the term abstraction interface . Type is an arrow that encapsulates the side-effects of the computation and must at least support environments and exception handling. We use these side-effects to implement pattern variables and backtracking in .

Computation takes a pattern and the matchee (the term to match) as input and yields the possibly refined term as output. For a wildcard pattern, we yield the matchee unchanged. For pattern variables, we look up the variable in the environment and distinguish two cases. If the variable is already bound to , we require the matchee to be equal to . If the variable is not bound yet, we insert a binding into the environment. For named subpatterns , we invoke the code for pattern variables recursively. The remaining four cases delegate to the term abstraction, passing the function for matching subterms as needed. When a pattern match fails, it throws an exception to reset all bound pattern variables.

The generic analysis code for pattern matching captures the essence of pattern matching in Stratego and closely follows Stratego’s concrete semantics. In fact, the generic code can be instantiated to retrieve a fully functional concrete interpreter for Stratego. This makes the generic interpreter relatively easy to develop: no analysis-specific code is required. All analysis-specific code resides in instances of interfaces like and . Sturdy further exploits this to support compositional soundness proofs of analyses [13].

2.2 A Singleton Term Abstraction

We can derive complete Stratego analyses from the generic interpreter by instantiation. Specifically, we need to provide implementations for the type classes it is parameterized over. For standard interfaces like and , we provide reusable abstract semantics. However, the term abstraction is language-specific and analysis-specific. Thus, this interface needs to be implemented by the analysis developer.

To illustrate the definition of term abstractions, here we develop a singleton analysis for Stratego. The analysis determines if (part of) a program transformation yields a constant output, such that the transformation can be optimized by constant propagation. Note that in this paper we are only concerned with the definition of analyses; the implementation of subsequent optimizations is outside the scope of the paper.

Each term abstraction needs to choose a term representation. For the singleton analysis, we use a simple data type with two constructors:

figure ai

A term means that the transformation produces a single concrete Stratego term of type . In contrast, means that the transformation cannot be shown to produce a single concrete term.

Based on such term representation, a term abstraction for Stratego must implement the 10 functions from the interface. We show the implementation of four of these functions in Listing 2 that also appeared in Listing 1.

figure ao

Function in Listing 2 defines a computation that takes a string value and a matchee of type as input. If denotes a single concrete term, delegates to the concrete string matching semantics using . However, if the matchee is , we cannot statically determine if the pattern match should succeed or fail. Thus, we join the two potential outcomes: Either pattern matching succeeds and we return unchanged, or pattern matching fails and we abort the matching by throwing an exception. Function is analogous to .

Function distinguishes three cases. The first case checks if matchee denotes a single concrete term with constructor and right number of subterms. If so, we recursively match the subpatterns against the subterms, converted to singletons. Then, if all submatches yielded singleton terms again, we refine the matchee accordingly. The second case occurs when denotes a singleton term but does not match the constructor pattern. In this case, we simply abort. Finally, if is , we combine the two cases using a list of terms as subterms. Note that the recursive match on the subpatterns is necessary to bind pattern variables that may occur.

2.3 Soundness

Our approach drastically simplifies the soundness proof of the abstract interpreter. In particular, by factoring the concrete and abstract interpreter into a generic interpreter, we do not have to worry about soundness of the generic interpreter. Instead, its soundness proof follows by composing the proof of smaller soundness lemmas about its instances [13]. Furthermore, because we instantiate the generic interpreter with sound analysis components for environments, stores and exceptions, we do not have to worry about soundness of these analysis concerns either [12]. All that is left to prove, is the soundness of the term operations.

2.4 Summary

Our approach to developing static analyses for program transformations consists of two steps. First, develop a generic interpreter based on standard semantic components and a parametric term abstraction. Second, define a term abstraction and instantiate the generic interpreter. While the term abstraction is language-specific and analysis-specific, the generic interpreter can be reused across analyses and only needs to be implemented once per transformation language. In the subsequent section, we explain how to develop and instantiate generic interpreters for transformation languages using standard semantic components. Sections 4 and 5.1 demonstrate the development of sophisticated term abstractions.

3 Generic Interpreters for Program Transformations

Creating sound static analyses is a laborious and error-prone process. While there is a rich body of literature on analyzing functional and imperative programming languages, static analysis of program transformation languages is under-explored. Most work in the area of program transformations so far focused on type checking, which considers each rewriting separately and is limited to intra-procedural analysis.

The key enabler of our approach are generic interpreters that can be instantiated with different term abstractions to obtain different analyses. In this section, we demonstrate our approach at the example of Stratego and show how to develop generic interpreters for Stratego. In particular, we show that the features of program transformation languages do not require specific analysis code but can be mapped to existing language concepts whose analysis is already well-understood.

3.1 The Program Transformation Language Stratego

Stratego is a program transformation language featuring rich pattern matching, backtracking, and generic traversals [24]. For example, consider the following desugaring of Java extended with pairs [9] in Listing 3. The two rewrite rules of the form above use pattern matching to select pair types and expressions, respectively. They then generate representations of pair types and expressions using the class. The rewriting strategy traverses the input AST top-down and tries to apply both rewrite rules at every node, leaving a node unchanged if neither rule applies. We also added the definitions of the higher-order functions and from the standard library. The built-in primitive takes a transformation and applies it to each direct subterm of the current term. Function uses to realize a generic top-down traversal over a term, applying to every node. Function uses left-biased choice to catch any failure in and to resume with the identity function instead. Furthermore, the Stratego compiler translates the rewrite rules of the form to transformations :

figure bx
figure by

The translated rule first matches the pattern , binding all pattern variables to the respective subterms and then builds the term using the abstract syntax of Java.

3.2 A Generic Interpreter for Stratego

We demonstrate how to map these language features to standard language concepts and how this enables static analysis of program transformations. To this end, we developed a generic interpreter for Stratego.Footnote 2 The generic interpreter is based on a previous Sturdy case study [13] that was never described in detail.

figure cb

We consider fully desugared Stratego code in our interpreter, ignoring Stratego’s dynamic rules. This core Stratego language [23] only contains 12 constructs as defined by the data type in Listing 4. We explain these constructs together with their generic semantics, shown in the same listing. The semantics is defined by a function that accepts a Stratego program and yields a computation of type , meaning that a Stratego program takes a term as input and yields another term as output. That is, Stratego programs are term transformations as expected. The arrow captures the side-effects of the computation, as explained in Sect. 2.1.

The first two core Stratego constructs deconstruct and construct terms. A transformation is based on a term pattern , which it matches against the input term  . Function from Listing 1 implements the actual pattern matching, as we have discussed in Sect. 2. Recall that binds pattern variables in the environment as a side-effect and throws an exception if the pattern match fails. We will see shortly how these side-effects are supported by the generic interpreter. A transformation is the dual of match: it constructs a new term according to the pattern, filling in information from the environment in place of pattern variables.

The next four core Stratego constructs handle control-flow. The identity transformation returns the input term unchanged. A sequence of transformations and pipes the output of into . The transformation never succeeds and always throws an exception using , which we also used to indicate failed pattern matches. To catch such exceptions, core Stratego programs can use guarded choice, written in Stratego notation. Guarded choice runs if fails (throws an exception) and otherwise. We implemented guarded choice using the function. Like is declared in the interface and allows us to catch exceptions triggered by . There are two things to note here:

  • The implementation of and are not specific to Stratego and are provided as sound reusable analysis components [12] by the standard library of Sturdy. We are effectively mapping Stratego features to these pre-defined features of Sturdy.

  • We can choose how exceptions affect the variables bound during pattern matching. For Stratego, we need exceptions to undo variable bindings in order to correctly implement backtracking. However, in other languages we may want to retain the state of a computation even after an exception was thrown.

The next three constructs handle scoping, strategy calls, and local strategy definitions. We discuss the first two of these in some detail. Stratego’s scoping is somewhat unconventional, because Stratego has explicit scope declarations and environments follow store-passing style. Variables listed in a scope declaration are lexically scoped as usual, but other variables can occur in the environment and must be preserved. We use function (at the bottom of Listing 4) to implement this scoping. First, we unbind the scoped variables from the current environment to allow pattern matching to bind them afresh. Second, after the scoped code finishes, we restore the bindings of scoped variables from the old environment while retaining other bindings from the current environment unchanged. Scoping also occurs when calling a strategy. To evaluate a call, we first find the strategy definition, then lookup the term arguments in the current environment, and then invoke the strategy using for the term parameters .

The final three constructs are generic traversals that use to call on the subterms of the current input term. Function is part of the interface and thus analysis-specific because depends on the term representation. Functions are part of the generic interpreter and ensure that, respectively, exactly one, at least one, or all of subterms are transformed by the given strategy . This way our generic interpreter separates term-specific operations from operations that can be defined generically.

3.3 The Term Abstraction

At this point, all that it takes to define a Stratego analysis is to implement the interface for a new term abstraction. The rest of the analysis is given by the generic interpreter and reusable functionality from the Sturdy library.

The generic interpreter described in the previous section crucially relies on the term abstraction. In particular, pattern matching, term construction, and generic traversals must inspect or manipulate terms. In Sect. 2 we have seen how used term operations and how we could implement these for the singleton term abstraction. Here we show the complete interface for term abstractions.

Stratego terms are strings, numbers, or constructor terms:

figure dq

Our interface must at least provide operations to match and construct such terms. In addition, we must support Stratego’s generic traversals and explode patterns. Note that Stratego represents lists using constructors and :

figure dt

We designed an interface for term abstractions of Stratego terms that requires only 10 operations. Listing 5 shows the corresponding type class. The interface contains four functions for pattern matching, four functions for term construction, one equality function, and one function to map subterms.

figure du

We have discussed the functions for pattern matching Sect. 2 already. Function takes a function for matching subterms against subpatterns. Function takes functions for matching the constructor name and the subterms. The functions for term construction are straightforward. While function takes a String and a list of terms, function takes two terms. The first of these terms must be a string term, the second one must represent a list of terms. Finally, we require functions for checking the equality of two terms and for mapping a function over a term’s subterms. This last function enables generic traversals as shown in Listing 4.

Our interface for term abstractions can be instantiated in various ways by defining instances of the type class. We have shown an instance for the singleton term abstraction in Listing 2 and will describe further term abstractions in the upcoming sections. But it is worth noting that the interface can also be instantiated for concrete Stratego terms:

figure dz

This concrete term instance allows us to run the generic interpreter as a concrete Stratego semantics. This is not only great for testing the generic interpreter against a reference implementation of Stratego, but also crucial for proving the soundness of term abstractions against the concrete semantics.

To summarize, we implemented the Stratego language semantics as a generic interpreter based on a few term operations only. The generic interpreter maps many aspects of Stratego language to standard language concepts such as environments and exceptions. For these language concepts, we reuse the abstract semantics found in the Sturdy standard library. In the end, to design and implement a new analysis for Stratego, all it takes is a new term abstraction. We exploit this reduction of effort in the next two sections, where we develop two novel static analyses for Stratego by defining term abstractions.

4 Sort Analysis

In this section, we define an inter-procedural sort analysis for Stratego. The analysis checks if a program transformation generates well-formed programs and to which sort the program belongs. That is, we implement a term abstraction where we choose to represent terms through their sort.

4.1 Sorts and Sort Contexts

We describe the sorts of Stratego terms by the following Haskell datatype:

figure ea

Sort represents string values, represents numeric values. We use to represent named sorts such as . We further include sorts for representing Stratego’s lists, tuples, and option terms. Finally, represents the empty set of terms and represents all terms (also ill-formed ones). This means, we can guarantee a term is well-formed if its sort is not .

To associate terms to sorts, we parse the declaration of constructor signatures that are part of any Stratego program. Typically, these declarations are automatically derived from the grammar of the source and target language.

figure ei

Each line declares a constructor, the sorts of its arguments and the generated sort. We allow overloaded constructor signatures as long as they generate terms of the same sort. That is, if \(c:s_1 \ldots s_m \rightarrow s \in \varGamma \) and \(c: s_1' \ldots s_n' \rightarrow s' \in \varGamma \), then \(s = s'\).

The third signature declares that any term of sort should also be considered a term of sort . This is the result of injection production in the grammar and effectively declares a subtype relation . Dealing with subtyping correctly is one of the major challenges of developing a sort analysis. Thanks to our separation of concerns, we can fully focus on that challenge here.

We collect all constructor signatures and the subtyping relation in a sort context:

figure em

Since we require the context when operating on sorts, we actually represent terms abstractly as a pair . However, all terms refer to the same context and the context never changes. To simplify the presentation in this paper, we assume the context is globally known and terms are represented by alone.

4.2 Abstract Term Operations

In the remainder of this section, we explain how to implement the term abstraction for our sort analysis. To this end, we have to provide an instance of type class as shown in Listing 6. We only show the code for lists and user-defined constructor and omit the other cases for tuples and optionals for brevity.

figure eq

As a warm-up, consider operation that yields sort independent of the string literal. When matching a string against sort in , the match can only succeed if terms may be part of terms. Otherwise the match must fail.

Arguably the most interesting part of the term abstraction is building and matching constructor terms. Let’s start with operation , which obtains the constructor name and the list of subsorts . In Stratego, list, tuple, and optional terms use reserved constructor names. We include one case for each reserved constructor to generate the appropriate sort. For example, constructor can be applied to an empty argument list to generate an empty list. This list has sort . Constructor generates a compound term that has sort list if the second argument was a list. The sort of the resulting list is the least super-sort (\(\sqcup \)) of the new head list and the tail. The empty constructor generates tuples; and generate optional terms.

The last case of handles user-defined constructor symbols . We use to look up the signatures of from the sort context. We only retain those signatures that can accept the constructor arguments . Finally, we collect all result sorts and compute their greatest lower bound. If none of the signatures matches, we return sort . For example, consider the call:

figure fo

If the signature of is , we obtain as result. If the signature is instead declared as , we obtain because the constructed term is ill-formed (unless is a sub-sort of ).

Operation is quite complex, although all cases for reserved constructors follow the same pattern:

  1. 1.

    We check if the sort of the current term is compatible with the matched constructor. For example, a match against can only succeed if the sort is a list.

  2. 2.

    We retrieve the subterm sorts if any. For example, for we have two subterms: the head element and the tail list. Auxiliary function carefully finds all possible list elements, taking subtyping into account.

  3. 3.

    We match the subterms against the subpatterns, yielding refined subterms .

  4. 4.

    We refine the current term by calling on the refined subterms and the matched constructor. Since matching may always fail, we join the result with a call to .

The last case of again handles user-defined constructor symbols . We use to obtain all constructors and their argument types . If the constructor has the required name and the right number of arguments, then the corresponding match might succeed. We match the subterms and refine the current term as in the other cases, but the we compute the least upper bound over all possible results. For example, when we match a constructor against sort , we would lookup all constructors that generate sort . For the match can succeed, but for the match must fail. The join operator merges the results to compute a sound approximation.

Lastly, we show the code of , which needs to retrieve the current subterms as a list and pass them to . However, sorts do not directly point out their subterms. Again we use to retrieve the sorts of subterms indirectly by finding all constructors of the current sort and taking their parameter lists. For example, if we call with sort , then computation f will be called on for constructor and on for constructor .

Fig. 1.
figure 1

A simplified trace of the sort analysis of the pair desugaring, where we abbreviate with .

To summarize, in this section we defined a sort analysis for Stratego, simply by designing a sort term abstraction which implements the interface. The rest of the analysis we get for free from the generic interpreter and reusable analysis code. As the reader probably noticed, the term abstraction for sorts is fairly complex in its own right. Being able to focus on the term abstraction without considering other analysis aspects was crucial.

4.3 Sort Analysis and Generic Traversals

In this subsection, we showcase the inter-procedurality of our sort analysis by analyzing generic traversals. A generic traversal traverses a syntax tree independent of its shape and transforms the visited nodes. Statically assigning types to a generic traversal is notoriously difficult, because the type needs to summarize all changes the traversal does to the entire tree. In this subsection, we will illustrate how our inter-procedural sort analysis can support some generic traversals, before refining our analysis further in the subsequent section.

Consider the trace of the sort analysis (Fig. 1) of the pair desugaring from Sect. 3.1. The trace starts in the function with an input term of sort . The function calls , which calls , which calls the desugaring rules . The rule either yields a term of sort or fails because the pattern matches some but not all terms of sort . Furthermore, the rule definitely fails because no terms of sort match the pattern . Even though one of the rules failed, the call produces a successful result by applying the input term to the identity transformation. The function then passes the resulting term of sort to the generic traversal . Since we know the sort of the current term, we enumerate all relevant constructors and the sorts of their direct subterms and recursively analyze the desugaring for them. In the example trace of Fig. 1, we consider three subterm sorts of . The second and third recursive call to resolve easily, whereas the first recursive call would end up in a cycle (shaded nodes in Fig. 1). To this end, we use a fixpoint algorithm with widening to ensure that the analysis terminates.

The example shows why it is hard to analyze the type of a generic traversal: For different input sorts, a generic traversal might produce different output sorts. Therefore, our sort analysis reanalyzes a generic traversal for each input sort, instead of assigning a fixed type like a type checker would do.

The example we considered here is a special case of generic traversals, known as type-preserving. A generic traversal is type-preserving if the sort of the input and output term are the same at every node. However, some generic traversals change the sort of the input term. The sort analysis of this section is not capable of analyzing such type-changing generic traversals. To this end, we require a more precise analysis, which we develop in the following section.

5 Locally Ill-Sorted Sort Analysis

Many program transformations, like a compiler, translate terms from one sort to terms of another sort. When these program transformations use generic traversals, they produce mixed intermediate terms, which contain subterms of the input sort and subterms of the output sort. Because mixed intermediate terms are not well-sorted, these program transformations are challenging to type check.

Fig. 2.
figure 2

The top of the figure contains a type-changing generic traversal that translates boolean to numeric expressions. The bottom contains the analysis trace of the transformation, where we abbreviate with .

For example, consider the traversal in Fig. 2 that translates Boolean expressions into numeric expressions in a bottom-up fashion. The boolean expression is transformed in two steps:

figure hw

Even though the input term is a valid boolean expression and the output term a valid numeric expression, the transformation creates an intermediate term , which is ill-sorted. The sort analysis of the previous section is only able to check transformations which produce well-sorted terms and therefore cannot handle this example. To analyze this example, we need a more precise sort analysis that can represent ill-sorted terms, which we develop in the remainder of this section

5.1 Term Abstraction for Ill-Sorted Terms

The key idea is to use a term abstraction which can represent terms with well-sorted leafs and an possible ill-sorted prefix, such as . This abstract term represents all terms with as top-level constructor and two numeric expressions as subterms. We implement this term abstraction with the following Haskell type:

figure ic

The case represents well-sorted terms that belong to sort , and the case represents terms with an possibly ill-sorted prefix. For example, this datatype allows us to represent the ill-sorted term with the abstract term

figure ih

5.2 Abstract Term Operations

We develop an analysis for Stratego by implementing the term operations with the term abstraction from above. We only discuss the and operations (Listing 7), because the remaining functions are similar to the operations of the sort analysis.

figure ik

The operation first matches on the term representation and in both cases calls the helper function, which compares the constructors, arity and subterms. The function, similar to Listing 6, looks up all constructor signature for a sort, but additionally converts the signatures to abstract terms. This operation is more than the of the sort analysis, because we may know the top-level constructor of the term. This improved precision results in more pattern matches which unconditionally succeed or fail.

In contrast to the sort analysis, the operation in Listing 7 does not check if the constructor and its subterms belong to a valid sort. Instead, it constructs a new abstract term, which may or may not be well-sorted. The type checking of this term is then delayed until a later point.

With these definitions, the analysis would be able to check some type-changing generic traversals, however, it might not terminate because the abstract terms might grow arbitrarily large. To avoid this problem, we reduce the size of abstract terms by type checking their subterms. For example, we can type check the immediate subterms of to obtain the abstract term . In the new term, the sort \(\top \) indicates the type checking of failed and the term is ill-sorted. We use this technique in a widening operator [7] that ensures that the analysis terminates. The operator simply type checks all subterms deeper than a certain limit k, such that the resulting terms are not deeper than k.

5.3 Analyzing Type-Changing Generic Traversals

In the remainder of this section, we discuss how the analysis of this section checks type-changing generic traversals. To this end, we discuss an analysis trace of the example at the beginning of this of this section (Fig. 2).

The trace in Fig. 2 shows only the final fixpoint iteration (earlier iterations produce subsets of the sets shown in the trace). It starts with the analysis of the function with the boolean expression sort , which is then passed to . In contrast to the top-down traversal, the bottom-up traversal first traverses with over the subterms of boolean expressions and replaces them by numeric expressions, e.g., . The resulting set of ill-sorted terms is then passed to the rewrite rule . The rule then replaces each top-level boolean constructor with a numeric constructor without touching the subterms. All terms in the resulting set are now well-typed and applies the widening operator to reduce this set to .

In summary, we defined an advanced sort analysis, which can represent ill-sorted terms. This analysis is able to check type-changing generic traversals, which produces ill-sorted intermediate terms.

6 Related Work

Transformation languages like Stratego [10] and PLT Redex [17] have a dynamic type checker for syntactic well-formedness. While dynamic type checking supports generic traversals, it does not help developers of transformations to understand the code. In contrast, we developed a static analysis such that program transformations can be checked before running them.

Other program transformation languages like Ott [22], Maude [5], Tom [2] and Rascal [14] use static type checking to ensure syntactic well-formedness. However, these languages do not support or struggle to statically check arbitrary generic traversals. Ott is a language for specifying rewrite systems and exporting them to proof assistants such as Coq or Isabelle. However, it does not support generic traversals. Maude is a language for specifying rewrite systems in membership equational logic. However, it implements generic traversals with reflection and hence cannot statically check their type. Tom and Rascal are statically typed transformation languages with support for type-preserving generic traversals. However, they do not support type-changing generic traversals. We explained in Sect. 5 why conventional static type checkers cannot analyze type-changing generic traversals: these traversals produce intermediate terms which are ill-sorted. In this work, we aim to analyze type-preserving as well as type-changing generic traversals. We solve this problem by defining a static analysis which can represent terms with a finite ill-sorted prefix. In contrast to a conventional type checking, this term abstraction is more precise than regular types, but requires computing a fixed point.

Lämmel distinguishes “type-preserving” from “type-unifying” generic traversals [15], as realized in Scrap-Your-Boilerplate [19]. A unifying generic traversal is a fold over the term that yields a value of the same “unified” type at each node. These kinds of generic traversals are easier to type statically, however, not all generic traversals fit in one of these two typing schemes. For example, a generic traversal that translates code from one language to another is neither type-preserving nor type-unifying. Rather than developing additional specialized traversal styles, our paper aims to support static analysis for arbitrary generic traversals.

Most closely related to our work, Al-Sibahi et al. present an abstract interpreter of a subset of Rascal, including generic traversals [1]. Al-Sibahi et al. use inductive refinement types as abstract domain. The main difference of our work is that we separated analysis-independent concerns (the generic interpreter) from analysis-specific concerns (the instances). This way we can develop different analyses for program transformations with relatively little effort. Furthermore, it also simplifies the analysis definition, because most of the language complexity is captured in the generic interpreter. Lastly, our work is based on the well-founded theory of compositional soundness proofs [13] provided by the Sturdy framework. This allows us to verify that soundness of analyses more easily, as we only need to prove that the instances are sound.

CompCert [16] is a formally verified C compiler. The compiler guarantees that the compiled program has the same semantics as the input program. To this end, each program transformation in the compiler passes has to preserve the semantics of the transformed program. While CompCert focuses on the semantics of the transformed program, the static analyses for program transformations in this work have to satisfy a different correctness property. Soundness of these static analyses guarantees that the analyses results overapproximate which programs can be generated by a program transformation. However, soundness does not give any guarantees about the semantics of the transformed program. In the future, we aim to develop more precise analyses for program transformation languages that allow us to draw conclusion about the semantics of transformed programs.

7 Conclusion

To summarize, in this work, we presented a systematic approach to designing static analyses for program transformations. Key of our approach is to capture the core semantics of the program transformations with a generic interpreter that does not refer to any analysis-specific details. This lets the analysis developer focus on designing a good abstraction for programs. We demonstrated the usefulness of our approach by designing three analyses for the program transformation language Stratego. Our sort analyses are able to check the well-sortedness of type-preserving and even type-changing generic traversals.