1 Introduction

Generating C is an odd problem: at first glance, it is trivial and nothing to write about. Beyond simple applications, however, complexity, traps, hazards snowball. One may as well write code directly in C to start with, or use a compiler.

Neither of these two choices may be palatable, especially in high-performance computing (HPC). For one, high-performant code is often voluminous and obscure – and hence hard to write directly by hand.Footnote 1 It also has to be re-adjusted for each new architecture and processor configuration. Hopes that an optimizing compiler save us such trouble were dashed two decades ago: see [3] for exposition and many references.Footnote 2 The alternative to writing performant low-level code by hand or to relying on a general-purpose compiler is code generation. Being domain-specific, a generator may employ very profitable but not widely-applicable expert knowledge and optimizations. Such code generation is being increasingly used in HPC, becoming dominant in some areas: ATLAS [23] for BLAS (Basic Linear Algebra Subroutines), FFTW [6] and SPIRAL [17] for FFT and related transforms, Halide [18] for image filtering, Firedrake [19] for partial differential equations using finite element method.

The present paper describes two orthogonal approaches for generating C using OCaml, with some guarantees and convenience. We take OCaml as a representative high-level language for writing generators, and C as a representative low-level language. The approaches extend to other languages (e.g., [21] for LLVM IR).

The first (also historically, [5]) approach is offshoring: It is based on the idea of generating simple OCaml code with certain correctness guarantees, and then translating this subset of OCaml to C, preserving the guarantees. The second approach is embedding C in OCaml as typed combinators, in tagless-final style [2, 8]. Both ensure the generated C code compiles without errors or warnings. Both explore the metaphor of a subset of OCaml as a notation for C.

Offshoring is a particularly attractive idea – and has been implemented twice before, see Sect. 5. There are also pitfalls and challenges, which may explain why these implementations have become unmaintainable and are no longer available. The current implementation of offshoring in BER MetaOCaml is done from scratch, in complete re-design and re-thinking of the earlier implementations to clearly expose and address the challenges and ensure maintainability. It has been publicly available since 2018 (but privately, quite earlier) and used in several Master and Bachelor projects, among others, for generating performant OpenCL (GPGPU) [7] and OpenMP [1] code and for robot control code. Alas, there is no published description.

Our Contributions. (i) Explicate the challenges in generating low-level (C) code, many of which have only become clear from our experience; (ii) Present the implementations of the two approaches, which are freely available and have been used in practice, see Sect. 6; (iii) Describe how the two systems are designed to address or mitigate the challenges.

We concentrate on offshoring, with which we have more experience. The challenges and pitfalls are common to both approaches; the lessons learned with offshoring have directly carried towards the tagless-final embedding.

There are two sorts of challenges we have encountered in designing and using assured C code generation: technical and engineering.

Technically, the metaphor ‘simple OCaml as a notation for C’ does not actually hold: see, for example, control operators such as |break|, |continue| and |goto|, as well as the general for and do-while loops, which have no analogue in OCaml (Sect. 3.3). More subtly, and hence insidiously, is the difference between variables in C and variables of reference types in OCaml – which can easily lead to the generation of type-correct but ill-behaving code (Sect. 3.4). These problems make the embedding of C difficult, and all but doom offshoring.

The key idea reverberating throughout the paper is to keep the eyes on the goal: expressing an algorithm in an efficient-to-execute way – which can be done in a subset of C. For an example, see the manually written high-performance C code in OpenBLAS,Footnote 3 which is syntactically spartan. After all, C, as many languages, is redundant, with many ways of expressing the same algorithm. Some expressions may be more elegant or idiomatic – but we only care about efficiency. Covering all of C and generating every its construct is hence explicitly not the goal. A subset suffices, which is easier to put in correspondence with a subset of OCaml.

Among the engineering challenges is the unexpected need for type inference (Sect. 3.1), and, mainly, extensibility, Sect. 3.2. If a system is not easy to extend, it falls into disuse. We should be able to support architecture-specific types of C and its extensions (SIMD, CUDA, OpenMP, etc.) and generate code that interacts with external libraries.

The next section reminds the straightforward C generation, and the reasons one may quickly move beyond it. Section 3 describes the offshoring and its challenges, and Sect. 4 presents the tagless-final embedding of C. Section 6 briefly evaluates the usefulness and adequacy of the approaches and compares them.

MetaOCaml (which includes offshoring) is available from Opam,Footnote 4 among other sources (the current version is N111). The complete code for all examples and the tagless-final embedding are available at http://okmij.org/ftp/meta-programming/tutorial/0README.dr.

2 Prelude: Direct C Generation

What springs to mind when talking about code generation is directly emitting the code as strings. It also quickly becomes apparent why some abstractions and guarantees are desirable – as this section illustrates.

A notable example of directly emitting C code as strings is ATLAS [23]: a generator of automatically tuned linear algebra routines, which “is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS”.Footnote 5 Figure 1 shows a typical snippet of ATLAS code, itself written in C and using |fprintf| to generate C code. The variable |spc| is whitespace for indentation, and the variables |rC|, |i|, and |j| are combined to name identifiers declared elsewhere. Nothing guarantees that these identifiers are indeed all declared and the declarations are in scope – nor that the result is syntactically well-formed. Even if a |fprintf| output is syntactically correct, the presence of loops and branching in the generator makes it hard to see that the overall generated code will be too. (It is also not at all obvious that the generated code performs matrix multiplication.)

Fig. 1.
figure 1

A snippet of ATLAS: generation of the inner loop body for matrix-matrix multiplication

Syntactically incorrect generated code (as well as code with unbound variables, etc.) will certainly be discovered when trying to compile it. However, figuring out which part of the generator to blame and to fix is non-trivial – as, e.g., [15] confirm from their experience of debugging unbound variables in the generated code. Incidentally, the implementor of ATLAS himself is quite frustrated:

“As you have seen, this note and the protocols it describes have plenty of room for improvement. Now, as the end-user of this function, you may have a naturally strong and negative reaction to these crude mechanisms, tempting you to send messages decrying my lack of humanity, decency, and legal parentage to the atlas or developer mailing lists. ...So, the proper bitch format involves

  • First, thanking me for spending time in hell getting things to their present crude state

  • Then, supplying your constructive ideas”

(R. Clint Whaley: User contribution to ATLAS. Conclusion. 2012-07-10. math-atlas.sourceforge.net/devel/atlas_contrib/)

That is why some correctness assurances are needed.

What comes to mind next is a sort of an abstract syntax tree for C (several of which are available just in OCaml, see Sect. 5 for details). The generator then produces the tree data structure, which is pretty-printed into C code at the end. The pretty-printing ensures the result syntactically well-formed – but makes no further guarantees. One would have liked, at the very least, that the generated code compiles without errors or warnings and contains no problematic expressions, like the ones involving several increment operators. Full correctness of the generated code is hard to assure; however, at least the above problems, which often arise by misediting or typos, should be preventable.

3 Offshoring

As we have seen, emitting C code is better done not directly but through a level of abstraction that provides some guarantees. Which guarantees is an engineering decision, balancing against the ease of use and the implementation and maintenance effort.

One particularly attractive balance is offshoring [5]: treating a subset of OCaml as if it were a (non-canonical) notation for C. For example, consider the following OCaml code for vector addition (a typical BLAS operation):

figure a

One may even argue [5] that OCaml’s |addv| is C’s |addv|, written in a different but easily relatable way. Offshoring is the facility that realizes such correspondence between a subset of OCaml and C (or other low-level language). With offshoring, by generating OCaml we, in effect, generate C. Offshoring hence turns homogeneous metaprogramming into heterogeneous.

The first premise of offshoring is the ability to convert well-typed OCaml code into C code that surely compiles, without errors or warnings. Paper [5] formalized the OCaml subset-to-C translation and proved it type preserving. Bussone has formally (in Coq) shown the meaning preservation of the offshoring translation, in a yet unpublished work.Footnote 6

The second premise is the ability to produce OCaml with some correctness guarantees. It is fulfilled by MetaOCaml [9, 11], which generates OCaml code that surely compiles (meaning, i.a., it is well-typed and has no unbound variables). We now illustrate how it is all put together, continuing the vector addition example.Footnote 7

Definition 1 (Generating C via offshoring)

  1. 1.

    implement the algorithm in OCaml

  2. 2.

    stage it (add staging annotations) and generate possibly specialized code

    1. (a)

      test the generated OCaml code

  3. 3.

    convert the generated OCaml code to C, saving into a file

  4. 4.
    1. (a)

      compile and link the generated C code as ordinary C library code

    2. (b)

      compile the generated C code and (dynamically) link into an OCaml program, via an FFI such as [24].

Definition 1 presents the overall flow of offshoring. In our example, the first step has already been accomplished, by |addv| above. The next step is to turn it into a generator of OCaml vector addition, with the help of MetaOCaml’s staging annotations – specifically, so-called brackets |.<| and |>.|, which enclose the code to generate, the code template:

figure b

The result of evaluating is

figure d

It is a so-called code value: a value of the type that represents the generated OCaml expression, where is its type. Code values can be printed. In our case, the printout shows the original after desugaring and renaming of all identifiers. Such code value is the outcome of Step 2 of offshoring. Passing to the function , to be explained in detail later, accomplishes Step 3 and produces a file with exactly the C code shown earlier. It is an ordinary C code and can be linked with any C or other program that needs vector addition. It can also be called from OCaml, via an FFI such as [24].

Real-life use of offshoring is more interesting. For example, suppose that the size of vectors to add is known in advance, and is small enough to want to unroll the loop in . The generator of unrolled code cannot be obtained from merely by placing brackets; quite a few other modifications are required. It helps to generalize first:

figure o

where generates a list of integers through and |List.iter| performs a given action on each element of the list. We may stage it similarly to , by enclosing everything in brackets:

figure t

Passing this code value to , however, results in an exception: this code outside of domain of offshoring. It is not hard to see why: the argument of is a first-class function, which cannot be simply represented in C. If the array size is known at the generation time, a different placement of brackets becomes possible:

figure x

where

figure y

Here, besides brackets we used the other staging annotation, (pronounced ‘escape’) that denotes a hole in the code template (bracketed expression), to be filled by the code produced by the escaped expression. The function builds a sequence of code values. Evaluating gives the fully unrolled code:

figure ac

MetaOCaml offers the facility to compile and run this code, so we can test it on sample 4-element arrays and compare the result with or . It is easier to test OCaml code, if not for other reason that the out-of-bound indexing into an array results in an exception rather than undefined behavior.

Once we are satisfied that the generated OCaml code works, we pass it to obtaining:

figure ag

It is unsettling that can be easily staged and offshored to produce a for loop but not an unrolled loop; is the other way around. By generalizing a bit more:Footnote 8

figure ak

we not only express vector addition clearly but also, by appropriately instantiating , and combinators, may generate from the same expression a variety of programs (unrolled, not unrolled or partially unrolled for-loop) and apply strip mining and scalar promotion optimizations. App. A of the extended version of the paper, and addv.ml in the accompanying code show the details. We indeed produce the code with the look and feel of the HPC BLAS.Footnote 9

As we have seen for , offshoring applies only to a small imperative subset of OCaml. Therefore, offshoring is much simpler than an OCaml-to-C compiler, which must deal with the full language and support closures, tail recursion, GC, etc. None of this matters in offshoring, which hence produces C code that does not need any special run-time. Albeit simple, the offshorable subset of OCaml is adequate for numeric and embedded programming.

Although the offshorable subset is easy to define in theory (see [5]) it is hard to express in types, especially in the extant OCaml type system. Therefore, nothing actually prevents from being applied to the code outside the supported subset – in which case it throws an exception. It is not a soundness problem: soundness is about C code, if successfully produced, being well-formed and well-typed. The problem is that this exception is raised late, after the code to offshore has all been generated. MetaOCaml (OCaml, actually) supports location information, which could be used to emit detailed error messages (not implemented in the current version however). The best mitigation is to generate OCaml code not via brackets and escapes but via further abstraction layers (combinators), with OCaml type system enforcing abstraction – as shown App. A.

As attractive the metaphor of OCaml as C is, upon close inspection it breaks down, as the rest of Sect. 3 describes. Fortunately, it can eventually be held together, by workarounds and the design of the library that implements, enforces and steers towards the workarounds, and away from the problematic cases. As our refrain goes, the goal is to express an algorithm in C efficiently, not to generate idiomatic code and every C construct.

3.1 Type Inference

Looking closer at the earlier OCaml and the corresponding C code one notices that the correspondence is not as straightforward as one may have initially thought: the OCaml code mentions no types, whereas in C any declaration, of arguments and local variables, must be accompanied by their type. The need for type inference is an unpleasant surprise.

Fortunately, a MetaOCaml compiler is an extension of an OCaml and hence may use the OCaml type checker to infer types in the code to offshore. The result is an OCaml compiler internal data structure : type-annotated abstract syntax tree. Unfortunately, this data structure, and especially the type representation, is rather difficult to deal with: for example, checking if a type is is not a simple pattern-match but requires a sequence of obscure and undocumented internal compiler function calls.

For these reasons, the original implementation of offshoring [5] was tightly integrated with the OCaml type checker. Since the type checker (including the structure and often the type representation) notably change in every release of OCaml, the original offshoring almost immediately became unmaintainable and was removed when porting MetaOCaml to OCaml 3.12, which introduced especially many changes to the type checker.

The lesson was learned when resurrecting offshoring in BER MetaOCaml. The new offshoring is disentangled as much as possible from the OCaml type checker. The key idea is an intermediary language, Offshoring IR (Fig. 2), an abstraction barrier between and the rest of offshoring. In terms of Definition 1, Step 3 is hence split into two: converting the generated OCaml code to the Offshoring IR, and pretty-printing this IR as C code. The function |offshore| implements the first substep: takes the closed code value produced by MetaOCaml, invokes the OCaml type checker to infer types, and converts the resulting |typedtree| to the Offshoring IR. (The function’s first argument, , is explained in Sect. 3.2). It raises an exception if the input is outside the supported subset of OCaml. The function encapsulates all peculiarities of the OCaml type checker. When OCaml internal data structures change, only that function needs to be adjusted. It is engineered to be an ordinary library function, outside the OCaml compiler and using only what is exposed in compiler-libs library. The offshoring function also enforces soundness side conditions, described in Sect. 3.4.

Offshoring IR, Fig. 2, is a simple, typed, imperative language, with expressions |exp| and statements . The data structure represents the complete program, which is either a function returning a result or a procedure.

Continuing the example of vector addition from Sect. 3, the invocation produces the following IR code:

figure bb

The translation from OCaml to is really as straightforward as it looks from the example.Footnote 10 We see that in the IR, all identifier references and declarations are type-annotated, which makes it easy to produce C declarations later on. The local identifier names are all unique: courtesy of MetaOCaml. Therefore, no shadowing may occur – and identifier declarations may safely be lifted to a wider scope, which is sometimes necessary when emitting C, where all declarations must occur at the beginning of a block.

Fig. 2.
figure 2

The intermediate language (IR) of offshoring (defined in offshoringIR.mli)

Pretty-printing this Offshoring IR expression to C gives the code we have seen in Sect. 3. The pretty-printing is straightforward, in this case (we describe the complications later on). One may just as easily pretty-print the IR to other low-level imperative language, such as Fortran or LLVM IR. Extensibility was another design decision behind the IR.

3.2 Extensibility

The original offshoring [5] was not extensible at all: any change required recompiling of the entire MetaOCaml system. Extensibility is the must however. We should be able to accommodate the ever increasing assortment of integer and floating-point types of C as well as the short-vector types of various SIMD extensions. The generated C code often needs to interact with external libraries: therefore, we have to generate calls to library functions and deal with their data types. OpenCL/CUDA and OpenMP bring further challenges: generating pragmas, and annotations, and their own vector and scalar data types. All these extensions in the target of offshoring have to be representatable in its source, i.e., OCaml.

The current implementation of offshoring is designed for extensibility. Not only it is an ordinary library, which can be changed and recompiled independently of the MetaOCaml compiler. It is designed so that no recompilation should be needed. We illustrate using the example of offshoring the code to print an array of 32-bit floating-point numbers into a file. The example shows off calls to external library functions ( i/o of C) with their own data types (the pointer to the structure), as well as dealing with 32-bit floating-point numbers with no equivalent in OCaml (OCaml |float| corresponds to |double| in C). The example is designed to answer some of the most frequently asked questions about offshoring. The complete code accompanies the paper; Fig. 3 shows the salient parts.

The first puzzle is how to generate OCaml code that represents calls to external C functions and uses their data types – the functions that are not generally callable from OCaml as they are. The answer is to define a module to represent that external library: lines 1–11 of Fig. 3. For the purpose of offshoring, all the library data types can be abstract and all functions dummy: we only need their signatures.Footnote 11 The type |float32| (line 7) is the type of short floats, introduced by the Offshoring IR interface as an alias to OCaml’s |float| and translated to of .Footnote 12

With the module in scope we may generate code. Lines 13–19 show the result, the (well-typed) MetaOCaml code value. Passing it to from Fig. 2 ends in an exception however: “unknown type: file”. Indeed, the offshoring library knows nothing about this data type. We need to tell it. First, we add to the IR types the new base type (meant to correspond to of C). As is an extensible data type, its extension is as simple as line 21. Defining the correspondence between the OCaml type and the just introduced |TFile| is the job of the module, whose implementation is provided by default. Lines 23–32 extend this module: line 25 maps to (so that matches in the C standard library). Lines 28–30 specify that the names in the module (namespace) are to be understood as names: global identifiers. With thus set-up module Conv, offshoring succeeds and produces an IR program, which can then be straightforwardly pretty-printed to C ( combines the IR translation and pretty-printing). The result is shown at the end of Fig. 3.

The grouping of C library functions and types as an OCaml module, in the manner of , has an unexpected benefit: C gets a module system, which it never had.

Fig. 3.
figure 3

Extensibility example, slightly abbreviated

3.3 Control Structures: Loops and Exits

One place where C and OCaml differ significantly is control structures. Although while loop has the same syntax and meaning in both languages, do-while has no analogue in OCaml. The for loop is present in both, but rather restricted in OCaml: the loop variable must be an integer, stepping only by one, up or down. Since loops with an arbitrary stride are common in HPC, the offshoring library offers a workaround. It defines the function:

figure bz

It is an ordinary OCaml function and can be used as is, in OCaml and in the generated code, e.g.:

figure ca

Its applications, however, are translated to by a special rule, which is better understood by looking at the result of translating the two nested loops in :

figure cd

Pretty-printing the result as C for-loops is straightforward (see App. A).

The do-while can be supported similarly; however, it rarely occurs in practice – and it can always be converted to the ordinary while.

C also has , , and . In principle, one may define dummy OCaml ‘functions’ like , whose applications are pretty-printed as C in a special way. The code with those functions can only be offshored, not executed as OCaml. Mainly, nothing prevents using such ‘functions’ outside loop bodies, hence breaking the guarantee that the result of offshoring always compiles. A better idea is to introduce iteration combinators with an early exit, like above (Example: in Appendix).

3.4 Pointers and References

The metaphor of OCaml as C is strained the most when it comes to variables. In OCaml, names (variables) stand for values, which may be mutable-cell values. In C, ordinary variables always denote inherently mutable memory locations.

At first, the difference does not seem insurmountable. Compare

figure cl
figure cm

which suggests the correspondence in Table 1(a), to be used in offshoring. However, applying blindly this translation to

figure cn

gives

figure co

which has a very different meaning: in OCaml, returns , but in C, zero.

Table 1. Variables and pointers in OCaml vs. C (whereas in OCaml an introduced variable is immediately bound to a value, in C we split the declaration from initialization, because declarations have to be grouped at the beginning of a block.)

One modest proposal is to avoid ordinary mutable C variables completely – relying on array references instead. An array name in C refers to a mutable location, but itself is immutable – just like an OCaml variable of a reference type. The proposed translation is summarized in Table 1(b). With it, OCaml’s and are translated into

figure ct

The translation results in highly un-idiomatic C, but it compiles to the same machine code (gcc 8.2.2 -O2, x86_64): The efficiency does not suffer.

Although the proposal is attractive, BER MetaOCaml N111 implements a more conservative approach: using Table 1(a) but with side conditions that prohibit aliasing. (These side-conditions are used already in [5] but not described explicitly. They can be gleaned from typing rules in App. A2 of that paper.) That is, in the type t ref, must be a base type, and the RHS of a let-expression must not be an expression of a ref-type, with an exception of . In other words, when binding a reference-type OCaml variable, it should be clear, syntactically, that the mutable-cell value is fresh. The function that translates OCaml to checks these side conditions, raising an exception if they are violated. Although the side-conditions seem severe, (for example, preventing , which has to be written as ; both result in the same machine code however), from our experience generating (mostly numeric) code, they do not seem overly restrictive.

4 Tagless-Final Embedding

A different way to generate C from OCaml is to embed it in OCaml, in tagless-final style [2, 8]. This embedding is emphatically different from the mere representation of C AST in OCaml (Sects. 2, 5): the latter gives no assurances about well-typedness, absence of unbound variables or unexpected shadowing. Tagless-final embedding makes such assurances and hence guarantees that the generated C code compiles without errors or warnings.

In the tagless-final style, the embedded language is represented as a (multisorted) algebra whose operations are the syntactic forms of the language. The following collection of operations (an OCaml signature) represents a simple imperative language (not unlike the Offshoring IR, Fig. 2): a simple subset of C.

figure da

The body of the vector addition procedure (cf. in Sect. 3 and especially its result) has then the form

figure dc

(given , , , and in scope). This OCaml expression, of the type , clearly describes the for-loop of the vector addition. Once again we use OCaml as a metaphor for C. The embedding is typed: represents a C code expression (or statement, for ) of C type . The type of , for example, ensures that the loop condition is a comparison or dereference expression, and the loop body is a statement. A well-typed OCaml expression over the above signature thus represents well-typed C code.

An implementation of signature may realize as string, of C code. Then builds the code of while-loop from the code for loop condition and loop body. (The actual implementation uses the C AST, Sect. 5, pretty-printed at the end). Given an appropriate function header the above sample produces the code that looks quite like the C code in Sect. 3.

More extensive signature is in Appendix; it has been used in practice to generate stream-processing code. More operations can be added at any time: extensibility is the strongest point of the tagless-final style.

There are also complications, not unlike the ones described for offshoring. For example, can be used at several types, inferred by OCaml. Generating C variable declarations needs a run-time representation of that inferred type. Unlike offshoring, we implement our own inference (which is simple for a first-order language), relying on the OCaml type system to ensure OCaml’s and our run-time types agree. The implementation also enforces side-conditions in Sect. 3.4. Another challenge is the handling of local variables, which in C have to be all declared at the beginning of a block. The salient points of the implementation are summarized in App. B.Footnote 13 The accompanying code also shows the advanced example in App. A implemented in the tagless-final style.

This approach can be traced back to C-code–generating combinators in [3]. Those combinators were monomorphic, and the explicit passing of the C variable environment made them ungainly: Compare [3, Fig. 18]

figure dq

with in our approach. Ensuring well-scoping of the C code was also the responsibility of a programmer, to pass the environment in the disciplined way.

5 Related Work

Offshoring was first proposed in [5], which we have discussed already. Its implementation is no longer available. Asuna [22] was an attempt to resurrect offshoring and extend to SIMD extensions, parallelism and LLVM. The paper presented a few applications of offshoring HPC kernels, with few details about the implementation. Curiously the paper does not mention any restrictions on let-bindings in the source language (raising doubts about correctness). The implementation has not been available.

KreMLin [16] introduces Low*, a subset of F* that is easily mapped to a small subset of C. Because F* is a dependently-typed language, one may state and verify sophisticated correctness properties (including functional correctness). The paper proves that the mapping preserves not just typing but also semantics and side-channel resistance. Like in offshoring, the translatable subset of F* is not easy to state in types; therefore, C code extraction (‘offshoring’) is best effort. Also related is C code extraction from constructive Coq proofs – although Coq is quite harder to program in; it is also harder to control the form of the produced C code and ensure high performance.

Among other heterogeneous metaprogramming system we should mention MetaHaskell [13], LMS [20] and Terra [4] (which offers weak guarantees: it does not assure the absence of unbound variables).

FrontC by Hugues Cassé defines the abstract syntax for C, as an OCaml data structure, and includes the parser and the pretty-printer. It is used (with significant modifications) in CIL (C Intermediate Language)Footnote 14 [14] and Binary Analysis Platform.Footnote 15 Our abstract C syntax (produced by combinators in Sect. 4) is heavily influenced by Cassé’s, but re-written from scratch. FrontC is developed to represent any existing C program (so to analyze it). We are interested only in C generation, and so chose a small but just as expressive and ‘sane’ subset of C – quite in the spirit of CIL but with different design choices: In our subset, a declaration introduces only one variable. Mainly, we distinguish statements and expressions, and regard assignment and increment as statements. Therefore, such C constructions as and --x * y++ are not representable in our abstract syntax and hence never produced.

6 Evaluation and Conclusions

We have presented two implemented approaches for generating high-performance C code that compiles without errors or warnings and can be freely linked with other C libraries. Offering correctness guarantees requires generator abstractions, which is a challenge to design and maintain. We have described the notable problems we experienced and the ways we mitigated them. One of the main problems turns out maintainability. The current systems are explicitly designed to be extensible and to last.

Offshoring was used in [1] to generate OpenMP matrix-matrix multiplication code that is faster, sometimes 2\(\times \), than the state of the art BLAS code generated by ATLAS; tuning was also faster. In [7] offshoring has generated GPGPU (OpenCL) matrix-matrix multiplication and k-means clustering code. The tagless-final approach is used in the new version of [12], e.g., to generate C code for the FM radio application (\(\sim \)3,000 lines of code). Both approaches thus proved adequate for their intended tasks.

The two approaches share the metaphor of OCaml as C: representing a small imperative language (a subset of C) in the form of OCaml expressions. One approach (offshoring) relies on MetaOCaml, thus offering a better syntax for loops and control structures, let-insertion and type inference. The other approach uses the bare OCaml and is more portable and extensible.

The future work involves proving the meaning preservation, designing let-insertion for tagless-final embedding, and emitting better error messages.