1 Introduction

Nowadays parallel architectures are everywhere, but not parallel programmers. High-level programming abstractions and methods are needed, in particular for distributed memory models. Our goal is to provide a framework to ease the systematic development of correct parallel programs.

In the Bird Meertens Formalism (BMF) [2], an efficient program is obtained from a naive functional program considered as a specification, through program transformations. BMF can be applied to parallel programming (e.g. [5]). In these approaches, however, the transformations are pen-and-paper proofs and the transition from an efficient functional expression to a parallel program (often in an imperative language, for example a library of algorithmic skeletons hosted in C++) is not grounded on a formal basis.

We develop the framework SyDPaCC [7, 14] for the Coq proof assistant [22] to ease the use of methods based on program transformation and algorithmic skeletons and make them more reliable. More specifically the contributions of this paper are:

  • the support of list homomorphisms in SyDPaCC that previously only supported BSP homomorphisms [14] and the GTA paradigm [7],

  • improvements of the formalisation of the parallel functional language Bulk Synchronous Parallel ML (BSML) [13] and algorithmic skeletons in Coq,

  • an application of the framework to produce a parallel program for the maximum prefix sum problem.

We first give an overview of the features of Coq. The paper does not assume any prior knowledge of Coq, but familiarity with functional programming is necessary. Then we present how a function written as a composition of sequential map and reduce could be automatically parallelised in SyDPaCC based on a formalisation of BSML (Sect. 3) and verified parallel versions of map and reduce (Sect. 4). In Sect. 5 we present the modelling in Coq of two important list homomorphism theorems that are used to prove that a function having three simple properties can be expressed as a composition of sequential map and reduce and therefore be parallelised. The framework is exemplified on the maximum prefix sum problem in Sect. 6. Comparison with related work (Sect. 7) and conclusion (Sect. 8) close the paper.

2 An Overview of the Coq Proof Assistant

Coq [22] is an interactive theorem prover. It is based on the Curry–Howard correspondence relating terms of a typed \(\lambda \)-calculus [the calculus of (co)-inductive constructions] with proof trees of a logical system in natural deduction form. From a more practical side, Coq can be seen as a functional programming language, close to OCaml or Haskell but with a richer type system that allows to express logical properties. There are many cases where programs are developed in Coq and their properties are also proved in Coq. This is for example the case of the CompCert compiler, a compiler for the C language, implemented and certified using Coq [12]. Our SyDPaCC system also uses this style.

Fig. 1
figure 1

Lists in Coq, Haskell and OCaml

In Coq, data structures are defined only by induction. For example the list data structure is defined in Coq, Haskell and OCaml in Fig. 1. In all three cases, a list is built using the constructor for empty list (Nil), or the constructor that adds an element at the beginning of a given list (Cons). In Haskell and OCaml, a and

figure a

respectively are type variables. In the Coq version, A is simply a variable and we indicate this variable is a Type.

In Coq, values and types can be mixed together, and types may depend on values. For example the definition for lists of a given size begins with:

figure b

In this definition A is a parameter: all the applications of the inductive definition t should be applied only to A in the body of the definition of t. t has a second argument: a natural number indicating the size of the vector. As in the declaration of t different values are passed for this argument, it cannot be a parameter, hence its position after the : symbol. In summary in new type declarations in Coq, parameters/arguments of the type could be both types and values.

Figure 1 also shows the definitions of the map function in the three considered languages. All the definitions are recursive and by pattern matching on the list argument. They are quite similar but in their arguments: in Coq there are two arguments A and B that are types. Note that we could write

figure c

instead, but we prefer to let Coq infer the type. We also let Coq infer that the type of xs is list A. Considering the type arguments as regular arguments is very expressive. However, it may be verbose to explicitly apply a polymorphic function to the types it is parametric in. The Set Implicit Arguments command tells Coq to make this kind of arguments implicit (and inferred by Coq). In the recursive call of map these arguments are actually omitted and the code looks very similar to the Haskell and OCaml code.

If we request the various systems to give the type of map, Coq returns the type

figure d

whereas OCaml returns the type

figure e

and Haskell

figure f

. The

figure g

construction is a primitive one, it is called the depend product. This type is the type of the expression

figure h

where e has type B considering x has type A.

figure i

is syntactic sugar when B does not contain free occurrences of x.

It is also possible to define usual OCaml (or Haskell) notations for lists in Coq. In the remaining of the paper we will use [] for the empty list, and

figure j

in infix notation for Cons.

Dependent product, function abstraction, inductive definitions and (dependent) pattern matching are the key constructions of Coq. Logical elements can be built using them. We refer to [22] for the details, and just show an example of a lemma statementFootnote 1 from the Coq standard library about lists:

figure k

In this statement, \(\rightarrow \) may be thought as logical implication rather than functional arrow (actually these notions are in correspondence by the Curry–Howard correspondence). The predicate In could be defined as:

figure l

In the Curry–Howard correspondence, a type and a logical statement are the same and a program and a proof are the same. The notation

figure m

indicates that the name in_inv has what follows the colon as type. For the definition of a function or a value, the body of the definition (for example for function map) follows after symbol \(:=\); it is actually a syntax friendly \(\lambda \)-term of the calculus of inductive constructions. If we follow the Curry–Howard correspondence, the proof of the lemma in_inv could be also written as a \(\lambda \)-term/program. However, it is not convenient to do so and Coq provides a language of tactics to write scripts that build \(\lambda \)-terms that are proofs. For in_inv one possible script is:

figure n

We will not explain what exactly this script means, but only emphasise that Coq enters an interactive proof mode that helps the user to write the script and that just before Qed a \(\lambda \)-term is built. The command Qed then checks that this term has actually the type given as the statement of lemma in_inv. Only after this check is done, the lemma is added to Coq definitions. This means the only part of Coq that should be trusted is the type checking part: the kernel. Buggy tactics could build incorrect proofs but these proofs would be rejected by the kernel.

One additional strength of Coq is that the functional programming realm and the logical realm could be mixed together. The simplest example of doing so is the dependent pair:

figure o

For a predicate P over value of type A, a value of typeFootnote 2 sig A P is a pair composed of: a value a of type A together with a proof that P a holds. There exists a Coq notation for sig P A which is quite intuitive:

figure p

. In particular, this type could be used to express pre- and post-conditions of functions in Coq. For example, the head of a list is only defined on non empty lists, but only total and terminating functions can be defined in Coq. Therefore one can use a pre-condition and write:

figure q

Here we use the Program feature of Coq that allows to consider that xs is of type list A in the definition of the body of the function and to omit some parts of the body using the

figure r

symbol on the right of

figure s

for the empty list case. Program then generates a proof obligation, for each hole in the body. In this case we do not have to write a proof script for the proof obligation because Coq proves it by itself. This proof is a proof by contradiction:

figure t

is both empty (from pattern matching) and non-empty (pre-condition).

Coq has a module system similar to OCaml’s. A module is a collection of inductive definitions, value and function definitions, theorems and so on. A module type is a collection of the same items that can be found in a module, plus more abstract elements: “parameters” that are names together with types but without a body. For a value or a function this corresponds to the usual notion of type signature. For a theorem, it means a theorem without a proof, in other words an axiom. The general syntax for parameters is Parameter name : type and a synonym is Axiom. An example of module type is given in Fig. 4.

A module realises a module type if it provides the bodies of all the parameters of the module type. A module can be parametric, i.e it can depend on the implementation of one or several other modules. For example a module implementing a data-structure of sets using trees could depend on a module defining the type and ordering of the elements. The definition of such a module would look like

figure u

where OrderedType is a module type.

As a motivation for introducing the last feature of Coq we need, let us consider join-lists. Join-lists and list homomorphisms suit very well to the divide-and-conquer paradigm and are therefore very interesting as formal basis for a parallel program development system. The SyDPaCC framework uses this kind of functions as a foundation.

A join-list is a finite sequence of values of the same type. It can be: the empty list [], a singleton [a] (for some element a) or a concatenation \(xs +\!\!\!\!+\,ys\) of two lists xs and ys. The concatenation is associative and [] is its identity element. map and reduce can be defined on join-lists as:

$$\begin{aligned} \begin{array}{llll} map ~f~[\,] &{}=&{} {[}\,{]} &{} reduce ~\oplus ~[\,] = id_\oplus \\ map ~f~[x] &{}=&{} {[}f\ x{]}&{} reduce ~\oplus ~[x] = [x] \\ map ~f~(xs +\!\!\!\!+\,ys) &{}=&{} &{} reduce ~\oplus ~(xs +\!\!\!\!+\,ys) =\\ ( map ~f~xs) +\!\!\!\!+\,( map ~f~ys)&{}&{}&{} ( reduce ~\oplus ~xs) \oplus ( reduce ~\oplus ~ys) \end{array} \end{aligned}$$

where \(\oplus \) is an associative operator with identity element \(id_\oplus \).

Of course in Coq join-list cannot be defined as is because this definition has logical properties on the constructors; However, join-lists are isomorphic to cons-lists as defined in Fig. 1. Function \( map \) corresponds to function map in Coq. For \( reduce \) there are two difficulties: first we assume that \(\oplus \) and \(i_\oplus \) form a monoid, and second reduce is defined, as it is traditional in the Bird Meertens Formalism, as a binary operation taking \(\oplus \) and a list. The first difficulty can be simply overcome: these logical properties could be added as a pre-condition to an argument containing both \(\oplus \) and \(i_\oplus \):

figure v

As A would be implicit, reduce would be a binary operation where the m argument is a complex value containing the binary operator, its identity element, and the fact that they form a monoid with A. A better solution would be:

figure w

in such a way that H is made implicit: the identity element of op and the fact they form a monoid with A should come from some information of the context where reduce is used.

Monoid could be defined as a conjunction of properties about op and e. A convenient way to do so is to use a record that is in fact syntactic sugar for an inductive definition plus the dot notation to access the fields:

figure x

Note that all the fields are logical statements. Therefore we use the proof mode to define a value of type Monoid:

figure y

It is possible to use the notation

figure z

instead of

figure aa

in a definition to make an argument implicit. However even if we do that in the proposed signature of reduce, Coq will have no way to infer this implicit argument. The necessary Coq feature are type classes. A class is similar to a record but values of these classes are defined by the keyword Instance that defines them and stores them into a database that is queried by the Coq inference mechanism when it encounters an implicit argument whose type is a class.

The class Monoid only differs from the record Monoid in using the command Class instead of Record. Then we can define instances as follows:

figure ab

where app is the concatenation of lists as defined in the Coq standard library.

One possible definition for reduce is then:

figure ac

It is possible to compute in Coq:

figure ad

Coq answers

figure ae

to this last command. First note that reduce is used as a binary operation and that its first argument is also a binary operator (here the addition on natural numbers). To infer the implicit arguments, Coq internals detect that the argument name H has as type a type class Monoid. Therefore it looks into its database of instances: it finds the last defined instance which has plus as an operator. In our setting it is the instance m_plus_O. This instance provides e and therefore all the implicit arguments are inferred.

3 Bulk Synchronous Parallel ML in Coq

An Overview of BSML Bulk Synchronous Parallel ML or BSML is a functional programming language currently implemented as a library for the functional programming language OCaml. The BSML library [13] provides two different implementations of the BSML primitives, a sequential implementation and a parallel implementation on top of MPI, as well as a standard library of parallel functions. Compared to a full language, the library does not provide the specific type system useful to ensure some properties about BSML programs [10]. We present here the classic syntax of BSML. A revised, more friendly syntax exists, but as programming in this style is based on OCaml pre-processing, it is not possible to have it in Coq.

BSML is based on the BSP model [23] and there is a cost model for each primitive, however for the sake of conciseness, we omit the costs in this presentation, as our framework currently does not use them. A BSP computer is a set of processor and memory pairs interconnected through a network together with a global synchronisation unit. A BSP program is a sequence of super-steps, each one being composed of: a pure computation phase where processors compute using only the data they have in local memory, a communication phase where processors exchange data, and a synchronisation barrier that ensures that all data exchanges are completed during the super-step.

BSML offers a global view of programs: a parallel program is structured as a usual sequential program but operates on a parallel data structure through dedicated primitives. This parallel data structure is called “parallel vector”. A parallel vector is composed of p values, one per processor. p is the number of processes of the underlying BSP computer. p does not change during the execution of a program. From the typing point-of-view, a parallel vector has type

figure af

: all the processors have values of the same type. Nesting of parallel vectors is not allowed, i.e.

figure ag

cannot contain a parallel type: This is one of the properties enforced by the BSML type system. In the remaining, we informally write \(\langle \,v_0\;,\ldots ,\;v_{p-1}\,\rangle \) for a parallel vector.

BSML provides four constants to access the BSP parameters of the parallel machine. The only one used in this paper is

figure ah

, the number of processors of the BSP computer. BSML also provides four primitives to manipulate parallel vectors.

figure ai

and

figure aj

are evaluated in the pure computation phases of BSP supersteps whereas

figure ak

and

figure al

need a full super-step to be evaluated, in particular there is an implicit synchronisation barrier at the end of

figure am

and

figure an

. The informal semantics of BSML primitives is given in Fig. 2.

Fig. 2
figure 2

BSML primitives—informal semantics

figure ao

is used to create a parallel vector from a function describing its components. For readers familiar with OCaml standard library, it is similar to the function

figure ap

but missing the first argument (the size of the create array), as the size of parallel vectors is always

figure aq

. OCaml being a higher-order functional language, it is possible to have a parallel vector of functions. Such a parallel vector cannot be applied to values as a parallel vector of functions is not a function, thus BSML provides the primitive

figure ar

to apply a parallel vector of functions to a parallel vector of values.

figure as

is the inverse of

figure at

: it creates a function from a parallel vector. Note that as processor names are represented by values of type

figure au

in OCaml,

figure av

is only the inverse of

figure aw

on the interval

figure ax

. The communication pattern of

figure ay

is a total exchange (each processor as to make its value available to all other processors). For more finely tuned communication patterns, BSML provides

figure az

. This primitive takes as input a parallel vector of functions describing the messages to be sent to other processors. For example if at processor 2, this parallel vector contains a function \(f_2\), and that \(f_2~4\) returns a certain value v, it means that processor 2 will send message v to processor 4. The result of

figure ba

is also a parallel vector of functions, but these functions describe the messages received by the processors. Continuing the same example, after having performed the

figure bb

, the function \(g_4\) at processor 4 is such that \(g_4~2\) returns v: processor 4 received message v from processor 2. By convention, the first constant constructor of a type is considered to represent the absence of message and therefore does not incur any communication. The empty list is such a value.

BSML in Coq There are two main ways to model the semantics of programming languages in a proof assistant: shallow embedding and deep embedding [24]. Deep embedding is well suited to reason about the general properties of the language such as typing is preserved by reduction/evaluation. However it is much less convenient to reason about individual programs.

When the base language is a functional one, another possibility is to use the proof assistant as a functional language, and add the new primitives and their semantics as signatures and axioms: This is a shallow embedding. In this setting, one cannot reason about general properties of the language as it would mean to reason about the interactive prover itself, but it is much more convenient for writing and reasoning about individual programs. All the proof assistant libraries could be used.

For modelling BSML in Coq we follow the shallow embedding approach. Moreover rather than adding signatures and axioms at the top-level of Coq, which may lead to logical inconsistencies, we write a module type. All the applications of BSML in Coq are then written as parametric modules that take an implementation of such a module type. After we extract from Coq to OCaml, we obtain OCaml functors and we apply them to the module implementing BSML: either the sequential implementation or the parallel implementation. Note that we also provide an implementation of this module type in Coq: it is meant to check that the axioms we provide are consistent with Coq’s logic, and as a by-product it is a verified sequential implementation of BSML.

The presented formalisation improves on previous ones [9, 21]. It is defined as a module type and it avoids to rely too much on sigma types. Proofs using this model are easier to do than using the previous ones.

Fig. 3
figure 3

BSP parameters in Coq

Fig. 4
figure 4

BSML primitives in Coq

First, there is a module type for BSP parameters (Fig. 3), we only consider the bsp_p parameter here, and we assume at least one processor in the BSP machine. The module type BSML models the BSML primitives. It contains a module Bsp of module type BSP_PARAMETERS and the following notation:

figure bc

meaning that a pid is a natural number n together with a proof that n is strictly smaller than Bsp.p.

figure bd

is a function testing the strict ordering of two natural numbers. The remaining of the module type BSML is given in full in Fig. 4.

Section Parallel_vectors models the type of parallel vectors. We called it par as in the OCaml implementation of BSML. It is an abstract type and the notation differs from an abstract type in OCaml: it takes as input a type (the type of the sequential components) and returns the type of parallel vectors of this sequential type. In order to model the semantics of BSML primitives, we need to be able to describe the value at a specific processor in a parallel vector: that is what the get operation does. Note that this operation is not a programming primitive of BSML. It should only be used in the logical parts. The axiom associated to this operation means that two parallel vectors are equal if and only if their components are equal. Some representation of parallel vectors (for example functions) may not satisfy this property, thus it is important to explicitly state this property as we want to have it.

Section Primitives describe the BSML primitives. For each primitive we give its signature in a style very close to their signature in OCaml, but in the handling of polymorphism. The semantics of the primitive is given by an axiom whose name ends by _spec. For example the axiom of mkpar,

figure be

clearly states that for all process identifiers i, the value held at processor i in parallel vector mkpar f is equal to f i. It is exactly a quantified version of the informal semantics of mkpar shown in Fig. 2.

Using this module type, it is possible to write BSML programs in Coq and to reason about their semantics using the specifications of the BSML primitives. One possible application is to program BSML algorithmic skeletons in Coq.

4 Algorithmic Skeletons and Automatic Parallelisation in Coq

Algorithmic Skeletons in Coq Writing algorithmic skeletons using BSML primitives in Coq is actually very close to doing the same in OCaml and BSML. We illustrate this on the implementation of map and reduce skeletons. To implement programs using the BSML primitives, one needs to implement a parametric module taking as argument a module that implements the BSML primitives:

figure bf

We use two utility functions for that:

figure bg

For these functions we follow the same modelling convention as for the primitives, but here the terms are given, by first defining a function then a lemma describing its specification. We omit the proof scripts but using automated tactics, they are very short. It is then easy to define a parallel map on distributed lists (here distributed lists means parallel vectors of lists):

figure bh

where

figure bi

is a tail recursive version of map. The parallel reduce can be defined using the proj primitive for communications:

figure bj

In the previous definition, symbol

figure bk

in

figure bl

indicates that the free variables in ty should be generalised, i.e. introduced before

figure bm

as additional implicit arguments. In this version of par_reduce the final result is not a parallel value. We can also have an alternative version where the result is a parallel vector with the same value on each of the processors. We choose here a simple algorithm, using put:

figure bn

In this alternative version, the return type is a parallel vector, and we use the ability of Coq to express post-conditions: this vector has the same value on all the processors. The Program feature of Coq allows to define the function as if no post-condition was present. It generates a proof obligation (corresponding here to this post-condition). The proof script is omitted here, but is short.

To verify that the implementations correspond to the semantics we have in mind, we could write _spec lemmas associated to each of these functions. However, it is not convenient to express the specification of par_map and par_reduce by describing the content of their output at each processor. As these functions are algorithmic skeletons, we rather prefer to express the correspondence they have with sequential functions.

Types and Functions Correspondences To do so, we rely on two main type classes (defined in the Core.Parallelisation library of SyDPaCC). First we need to express that a parallel type is a correct parallelisation of a sequential type.

Definition 1

(Type correspondence) There is a correspondence between a type A and a parallel type \(A_{p}\) (i.e. that contains at least one par) if and only if there exists a total surjective function \(join_{A}: A_{p}\rightarrow {}A\).

The totality of the function ensures that we can construct a sequential value for any value in \(A_{p}\) and its surjectivity ensures the existence of at least one parallel representation for any sequential element.

In Coq we formalise these notions as type classes:

figure bo
Fig. 5
figure 5

Type correspondences

Examples of type correspondences are given in Fig. 5. In module ParList, the join function transforms a parallel vector of lists into a list. This sequential list also provides a “global view” of the parallel vector of lists: We think of the parallel vector as a big list obtained by concatenation of the lists hold by the processors, in increasing order of processor identifier. The proof that join is surjective is quite easy: For a sequential list, it is enough to build the parallel vector that contains this list at one specific processor (for example the first one) and empty lists on all other processors.

In module ReplicatePar, join takes as input parallel vectors that contain the same value at all the processors, and returns this value. It is straightforward to show join is surjective, as it is enough to replicate the sequential value.

Using type correspondence, we can define a relation between sequential and parallel functions.

Definition 2

(Function correspondence) Given two types A and B, and their parallel correspondent \(A_{p}\) and \(B_{p}\), there is a correspondence between the sequential function \(f:A\rightarrow {}B\) and the parallel function \(f_p: A_p\rightarrow {}B_p\) if and only if \(join_B\circ {}f_p = f \circ {}join_A \).

In other words, the following diagram commutes:

This notion is also modelled as a type class:

figure bp

Of course a parallel function may take a sequential value as input and returns a parallel value, or may take as input a parallel value and returns a sequential value (such as the first version of par_reduce does). In these cases the diagrams are:

and the type classes are:

figure bq

There are two kinds of instances for the function correspondences: general ones and instances stating the correspondence of a specific parallel function and a specific sequential function. The former are also given in the library Parallelisation, the other are together with the skeleton definitions. There are several general instances that state facts about the correspondence of compositions. For two instances of FunCorr, we have for example:

figure br

The composition of the two sequential functions corresponds to the composition of the two corresponding parallel functions. Note that the directions of the arrows in our diagrams are important to have such a compositional notion.

Now that we can express the correspondence between a sequential function and a skeleton, we can do it for par_map:

figure bs

This piece of code assumes that the instance defined in the module ParList described above (Fig. 5) is in the context. It is this type correspondence that is used both for input and output.

In the case of par_reduce, the same type correspondence is used only for input, as the output is the same type A in the sequential and parallel cases:

figure bt

The correctness of the last skeleton

figure bu

uses the type correspondence defined in the module ReplicatePar (Fig. 5):

figure bv

Note that as the instances we have do not overlap, the instance resolution mechanism has no problem to find the instance we expect. In a richer library with overlapping instances (e.g. with different joins on parallel vectors of lists) the function correspondence might need to be specified more carefully by the user (i.e. some implicit arguments might need to be made explicit). However, as the library is organised in a modular way, the user has control over the visibility of overlapping instances.

Automatic Parallelisation If a sequential function h is defined as a composition of maps and reduces, the instances we defined can actually serve as an automatic parallelisation mechanism. Consider the following function:

figure bw

This function seems uninteresting as the result it returns is one of its arguments. However all the arguments, except the sequential function f, are implicit and are instances of type classes. It means that

figure bx

launches the instance resolution mechanism of Coq. Function h will first be decomposed if it is a composition, until instances of correspondence between a sequential function and a skeleton are found, and the parallel version par_h of h will be automatically built as a composition of skeletons.

The last component of the framework we need is thus a way to prove that certain classes of functions can be written as compositions of sequential functions that correspond to skeletons. The Bird Meertens Formalism, with the theory of list homomorphisms is a convenient formalism for this purpose.

5 List Homomorphisms and Their Theorems in Coq

In our parallelisation framework, we need functions defined as compositions of map and reduce. We consider the class of \(\oplus \)-homomorphic functions.

Definition 3

(\(\oplus \)-homomorphic) A function h on lists is \(\oplus \)-homomorphic if for all lists x and y, \(h(x+\!\!\!\!+\,y) = (h~x)\oplus (h~y)\).

Fact 1

If h is \(\oplus \)-homomorphic, then \(( img \,h,~\odot ,~h\,[])\) is a monoid.

Note this it is not true in general on the codomain of h. There are two useful theorems for our framework: the first and the third homomorphism theorems [11].

Theorem 1

(Third Homomorphism Theorem) If a function h is both:

  • \(\oplus \)-leftwards, i.e. there exists \(\oplus \) such that \(h([x]+\!\!\!\!+\,xs) = x\oplus (h~xs)\),

  • \(\otimes \)-rightwards, i.e. there exists \(\otimes \) such that \(h(xs+\!\!\!\!+\,[x]) = (h~xs)\otimes x\),

then there exist \(\odot \) such that h is \(\odot \)-homomorphic.

Theorem 2

(First Homomorphism Theorem) If h is \(\odot \)-homomorphic then \(h = ( reduce \,\odot )~\circ ~( map \,f)\) where \(\forall x, f x = h [x]\).

In this first theorem, we consider the restriction of \(\odot \) on the image of h, as in general \(( codomain \,h,~\odot ,~h\,[])\) is not a monoid (and \( reduce \) requires a monoid).

In Coq, we model the homomorphic property as a class:

figure by

Fact 1 is actually a result that is not trivial in Coq. First we need to define the image of h, and definitions to build the restriction of \(\oplus \) (named op from now on) to the image of h:

figure bz

Fact 1 is then written:

figure ca

Actually, we need a slightly more general result, where we replace the restriction of the operator by a function that is extensionally equivalent:

figure cb

This allows to replace the output of the third homomorphism theorem by a simplified version of the binary operator.

To express the first homomorphism theorem, we define a function that produces the composition of map and reduce from an op-homomorphic function and possibly optimised versions of op and of

figure cc

:

figure cd

There are default instances of the Optimised_op and Optimised_f that relate op and

figure ce

to themselves. We omit the details of the optimisation aspects and refer to the source code of the framework. The theorem then checks that this function produces a function equal to h:

figure cf

The classical proof of the third homomorphism theorem relies on the notion of weak right inverse, i.e. a function \(h'\) such that for all x, \(h\,x =h(h'(h\,x))\). In [11], a lemma states that for every computable total function h with enumerable domain, there exists a weak right inverse of h. The proof however is the following: “To compute \(g\,t\) for some t simply enumerate the domain of h and return the first x such that \(h\,x = t\). If t is in the range of h then this process terminates.” The problem with this proof is that: function g may not terminate but it is not possible to define non-terminating functions in Coq. Moreover even if it could be defined, its efficiency would be in general catastrophic.

Therefore we state and prove a weak form of the third homomorphism theorem using the following classes:

figure cg

To parallelise a function h, we should prove instances of Rightwards h oplus, Leftwards h otimes and

figure ch

and call the hom_to_map_reduce function on h. The resolution instance mechanism of Coq would then produce an instance of Homomorphic for h using the third homomorphism theorem.

The obtained composition could then be parallelised using parallel (or variant left_parallel) explained in the previous section. We illustrate this process on an application example in the following section.

6 An Example: Maximum Prefix Sum

The goal is to obtain a parallel function that solves the maximum prefix sum problem. An example of evaluation of the sequential function \( mps \) follows (the prefix whose sum is maximum is underlined): \( mps ~[\underline{1; 2; -1; 2; -1; 3}; -4] = 6\).

A trivial solution that we consider to be a specification follows, where prefix returns the list of all the prefixes of its input and t is the type of an abstract representation of mathematical integers (module type Number):

figure ci

maximum is not defined on the empty list: we use a version of reduce that also requires to be applied to a non-empty list. The property NonEmpty is a class:

figure cj

The variants of composition handle the additional arguments concerning non-emptiness. This is transparent to the user as there are instances of the NonEmpty class that state that prefix always returns a non-empty list and that map preserves non-emptiness.

mps_spec is not leftwards. But it is possible to tuple mps_spec with sum:

figure ck

ms_spec is opl-leftwards and opr-rigthwards:

figure cl

The proofs of the instances are about 20 lines long and they both use two 10 lines lemmas about maximum and sum. Then we need to find a weak right inverse of ms_spec:

figure cm

The third homomorphism is applied to show that ms_spec is \(\odot \)-homomorphic where \(\odot =\)

figure cn

. In the pen-and-paper proof, usually one shows that \((a_m,a_s) \odot (b_m,b_s) = (0\uparrow a_m \uparrow (a_s + b_m), a_s + b_s)\) where \(\uparrow \) returns the maximum of two numbers. Using the Optimised_op class, it is also possible to do it in Coq, in a process similar to the pen-and-paper proof (i.e. the final version is not known before the proof starts). This requires about 30 more lines using several short lemmas on mps_spec and sum.

To parallelise, we need to instantiate the parametric modules presented in the previsous sections and then call the left_parallel function (or the parallel function using the ReplicatePar in addition to ParList). Finally to obtain only the mps component of the result, we compose with projections (Fig. 6).

Fig. 6
figure 6

Parallelisation

The Coq code can then be extracted towards OCaml. The extraction mechanisms removes the logical parts to keep only the computational parts. What we obtain is an OCaml functor:

figure co

A parallel implementation on top of MPI results from applying this functor to a Bsml module that is a wrapper around the Bsmlmpi uncertified implementation of BSML in OCaml, C and MPI. The main difference between Bsml and Bsmlmpi is that in the former pid is the type nat and in the latter it is type int. As we dealt with an abstract representation of integers, we also need to provide an OCaml module that follows the extraction of the module type Number (basically a type for numbers plus basic operations on them). The application module would look as follows:

figure cp

.

The framework and results of scalability experiments are available online.Footnote 3 The framework only requires Coq 8.4, OCaml 4 or above, and an MPI library.

7 Related Work

To our knowledge SyDPaCC is the only approach in which actual source code of scalable and correct parallel programs can be obtained from a development in a proof assistant. The theory of constructive parallel algorithms provides various ways to ease the systematic development of correct parallel programs. However, the transformations are not mechanically verified and the transition from an efficient functional expression to a parallel program is not based on a formal basis. Most often, with the exception of [1, 4], the semantics of algorithmic skeletons remains informal.

There exist several logics to reason about BSP programs, for example [19], but none of them is mechanised in a proof assistant, and only one of them is related to a tool that can provide actual source code [26]. The main differences with our approach is that: LOGS starts from an imperative and local view of parts of the program to build a larger one by parallel composition whereas we start from a functional and global view and our framework is in Coq. BSP-Why [8] is an extension of the Why2 system for the deductive verification of imperative BSP programs. The tool is based on the generation of verification conditions from users annotations, those conditions being automatically proved by SMT solvers, or interactively with Coq. BSP-Why does not support program transformation and higher-order functions.

Although there are systems to support program calculation (for example [25]) they lack the rich set of theories that exist for Coq and that can be reused for program transformation. Moreover Coq offers a more trusted framework through its kernel. The work on polytypic programming and program transformation in Coq [17] and Agda [18] is also related. Our framework follows more closely a simple functional programming style with additional pre/post conditions or statements about the simple functional programs. Moreover for the parallel aspects it would be a challenge to provide correspondence between polytypic sequential functions and efficient parallel ones.

To our knowledge, there are only three works associating data parallelism and proof assistants. The operational semantics of a type safe subset of Data Parallel C is formalised in Isabell/HOL [6]. In this approach Isabelle/HOL expressions that represent programs are generated and manipulated. It is therefore a deep embedding approach. Swierstra [20] formalised mutable arrays in Agda, and added explicit distributions to these arrays. He used it to reason about a distributed map and distributed sum on these arrays. In BSML the distribution of parallel vectors is fixed but it is possible to define a higher-level data structure on top of parallel vector and consider various distributions of the data structure in parallel vectors [3]. Moreover it is possible to formalise mutable arrays, and even extract such imperative programs to OCaml as done by Malecha et al. [16]. Lupinski et al. [15] formalised the semantics of a skeletal parallel programming language. It is also a deep embedding that models both the high-level semantics of skeletons and their implementations in a formalised JoCaml. BSML shallow embedding is more convenient.

8 Conclusion

The SyDPaCC framework shows that the Coq proof assistant is suited for program calculation in the Bird Meertens Formalism tradition, and that when using the right features of Coq, the additional work to have machine checked proofs rather than pen-and-paper proofs is quite manageable. Coq also allows automatic parallelisation using algorithmic skeletons. The key points in this respect is that we provide a shallow embedding of a pure functional parallel programming library in Coq; that one can program and verify algorithmic skeletons in Coq based on this embedding; and that expressing the correctness of algorithmic skeletons with respect to usually sequential functions in a compositional way as instances of type classes makes possible the use of the instance resolution mechanism of Coq for automatic parallelisation. Actual programs are obtained through the extraction feature. The extracted code is combined with the unverified parallel implementation of BSML in OCaml and C+MPI. The core SyDPaCC presented in this paper is very concise (2 kLoC of Coq, 600 LoC of OCaml 120 LoC of C). Experimental results on the generated programs are presented in [7, 14, 21] and in the SyDPaCC website.