Keywords

1 Introduction

In several industrial standards (EN50128 for SIL3/SIL4, IEC61508 SIL3/SIL4, ISO 26262 for ASIL4), formal methods are highly recommended when developing safety critical software for the highest safety levels, for the specification, the development and/or the verification phases. However formal methods are highly recommended just like many other non-formal (combination of) techniques, as these recommendations are setup collectively and represent the industrial best practices. A generic, safe execution platform has been researched for years [5] [6], combining safety electronics and defect-free proven softwareFootnote 1. The CSSP was initially an in-house development project before being funded by the R&D collaborative project LCHIP (Low Cost High Integrity Platform) [7] to obtain a generic version of the platform (i.e. not only aimed at railway systems).

This paper shows the evolution of the CSSP, based on a number of building blocks previously used on in-house applications, from a starter kit for education to a more versatile safety computer for industry. Both hardware (based on commercial standard products) and software (based on Atelier B CASE tool as well as standard products like gcc) aspects are presented, as well as the programming model, adapted to allow more freedom for the software developer.

This paper is structured in five parts. Section 2 introduces the Terminology. Section 3 briefly introduces the B method. Section 4 introduces the CSSP SK\(_0\) and CS\(_0\). Section 5 introduces some bits of a CS0 application of a safety flasher. Section 6 concludes on this first application of the CSSP.

2 Terminology

This section contains specific definitions, concepts, and abbreviations used throughout this paper.

Atelier BFootnote 2 is an Integrated development environment (IDE) supporting the B method [1] and the B language for software development, and Event-B [2] for system-level analysis. Atelier CSSP is Atelier B extended with redundant code generator tool-chain, boot loader, and a new project type (CSSP project).

B0 is a subset of the B language that must be used at implementation level. It contains deterministic substitutions and concrete types. B0 definition depends on the target hardware associated to a code generator.

CRC: cyclic redundancy check [8], is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data.

HEX is a file format [9] that conveys binary information in ASCII text form. It is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices.

PLC: programmable logic controller [10], is an industrial digital computer which has been ruggedized and adapted for the control of any activity that requires high reliability control and ease of programming and process fault diagnosis.

Safety computer usually refers to a computer controlling a system where the emission of an erroneous output could injure or kill people. Safety techniques (error detection, redundancy, etc.) have to be used to lower the probability of occurrence of such a failure below an acceptable level defined by standards.

SIL: Safety Integrity Level [11], is a relative level of risk-reduction provided by a safety function. Its range is usually between 0 and 4, SIL4 being the most dependable and used for situations where people could die.

Reliability is the ability of a system to continuously perform its required functions under stated conditions for a specified time.

Output states (memory vs physical). A controller computes new values for its outputs every cycle. These values (stored in memory) are used to change (with signal conversion components) the physical state of the outputs. Identity between values in memory and outputs physical state are checked regularly (with a monitor checking the signal conversion) to assess if the controller is still able to control. Depending on how the outputs are implemented (relays, transistors), changing state may take more or less time and identity check has to be delayed accordingly.

3 Introduction to the B Method

B [1] is a method for specifying, designing, and coding software systems. It covers central aspects of the software life cycle (Fig. 1): the writing of the technical specification, the design by successive refinement steps and model decomposition (layered architecture), and the source code generation.

Fig. 1.
figure 1

A typical B development cycle, from requirements to code.

B is also a modelling language that is used for both specification, refinement (Fig. 2), and implementation (Fig. 3). It relies on substitution calculus, first order logic and set theory. All modelling activities are covered by mathematical proof that finally ensures that the software system is correct.

B is structured with modules and refinements. A module is used to break down a large software into smaller parts. A module has a specification (called a machine) where both a static and a dynamic description of the requirements are formalised. It defines a mathematical model of the subsystem composed of:

  • an abstract description of its state space and possible initial states,

  • an abstract description of operations to query or modify the state.

This model establishes the external interface for that module: every implementation will conform to this specification. Conformance is assured by proof during the formal development process. A module specification is refined. It is re-expressed with more information: adding some requirements, refining abstract notions with more concrete notions, getting to implementable code level. Data refinement introduces new variables to represent the state variables for the refined component, with their linking invariant. Algorithmic refinement transforms the operations of the refined component. A refinement may also be refined. The final refinement of a refinement sequence is called the implementation, it contains only B0-compliant models. In a component (machine, refinement, or implementation), sets, constants, and variables define the state space while the invariants define the static properties for its state variables. The initialisation phase (for the state variables) and the operations (for querying or modifying the state) define the way variables are modified. From these, proof obligations are computed such as: the static properties are consistent, they are established by the initialisation, and they are preserved by all the operations. Atelier B contains a model editor merging model and proof (Fig. 4) by displaying the number of proof obligations associated to any line of a B model, its current proof status (fully proved or not) and the body of the related proof obligations.

Fig. 2.
figure 2

Structure of MACHINE and REFINEMENT components.

Fig. 3.
figure 3

Structure of IMPLEMENTATION component.

Fig. 4.
figure 4

Atelier B model editor showing proof status.

Finally a B project is a set of linked B modules. Each module is formed of components: an abstract machine (its specification), possibly some refinements and an implementation. The principal dependencies between modules are IMPORTS links (forming a modular decomposition tree) and SEES links (read only transversal visibility). Sub-projects may be grouped into libraries. A software developed in B may integrate or may be integrated with traditionally developed code.

4 CLEARSY Safety Platform

The CLEARSY Safety Platform is a generic PLC able to perform command and control over inputs and outputs. For safety critical applications, the PLC has to be able to determine whether it is fully functional or not. In case of failure, the PLC should move to restrictive mode where all the outputs are deactivated. The stronger the risk of harming people in case of failure, the higher the Safety Integrity Level. For SIL3 and SIL4, the computations have to be performed by a minimum of two processors and checked with a voting system.

The CLEARSY Safety Platform is made of two parts: an IDE to develop the software and an electronic board to execute this software. The IDE is based on Atelier B and extended with dedicated tool-chains to generate and check bare-metal applications (no underlying operating system). The electronic board embeds two microcontrollers PIC32 able to deliver around 50 DMIPS. From the standards, their failure rate is considered as \(10^{-5}/\)h.

Fig. 5.
figure 5

The CSSP starter kit SK\(_0\) board, with its two microcontrollers \(\mu C_1\) and \(\mu C_2\). It embeds 3 digital inputs and 2 digital outputs. To increase the number of usable I/O, boards can be connected through the serial bus: inputs states are propagated through all boards, all boards executing the same complete logic taking into account all the inputs. The board is programmed and monitored from the Atelier CSSP IDE through a USB link.

The overall safety principlesFootnote 3 are common to all instances of the CLEARSY Safety Platform (starter kits SK\(_0\) and SK\(_1\), Core CS\(_0\)):

  • the vital part of the application is developed and proved with B, and produces two different binaries B\(_1\) and B\(_2\) (so-called REPLICATED) obtained by two diverse code generation and compilation chains. Both binaries are going to be executed sequentially and their (replicated) variables checked to be equal.

  • the non-vital part NV is developed in C, including the top-level sequencer.

  • the two identical microcontrollers PIC32 execute the same software, made of the B\(_1\), B\(_2\), and NV. Software is protected by CRC, and CRC verification is done at bootload (upload) and during execution. Memory (variables) is checked during execution. Messages are exchanged between the two microcontrollers to ensure liveness.

  • permissive outputs (high state) require both microcontrollers to activate these outputs (provide energy). The failing of the two microcontrollers at the same time is neglected as common failures modes are avoided due to the hardware design, software tool chain, and verification performed.

The CSSP is available as a starter kit, mainly aimed at education, and as an industry-ready version, for safety critical applications.

4.1 CSSP Starter Kit

4.1.1 Description

Available since end of 2017, the board provides 3 digital inputs and 2 digital outputs (Fig. 5), and a complete development environment to get initiated to the programming in B of the platform.

The kind of application is limited to control and command while executing cyclically the following functions: read inputs, perform computation, set outputsFootnote 4. The safety principles are built-in in the safety library. The developer is only able to fill the user_logic function (Fig. 6). Safety principles are out of reach of the developer who cannot alter them.

Fig. 6.
figure 6

Modelling the user_logic function. In the central component box, the substitution skip has to be replaced by a user-defined substitution involving inputs and outputs variables. This specification substitution is used to verify its implementation. As such, it has to be precise enough to enable meaningful proof.

4.1.2 Development Cycle

The SK\(_0\) development cycle is as follows:

  • specification model is written first from the natural language requirements (Function), then comes the implementation model, both using the same language (B).

  • models are proved to be coherent and to be correct refinements. The proof is automatic if the complexity of the model is not too high for the Atelier B theorem prover. Frequent interactive demonstrations can be turned in proof tactics to be applied automatically,

  • source code or binary is generated from implementation model:

    • Binary 1 (HEX file) is directly compiled from the implementation B model. The compiler has been developed in-house for supporting this technology.

    • Binary 2 (HEX file as well) is generated in two steps. First, Implementation models are translated to C, using the Atelier B C code generator. Then the C code is compiled with gcc (Fig. 7).

Fig. 7.
figure 7

CSSP application development and execution.

  • The two binaries are linked to a top-level sequencer and a safety library, both software developed in B by the CSSP IDE development team once for all, to constitute the final software.

  • This software is then loaded on the flash memory of the two microcontrollers (bootload mode).

  • When the board enters the execution mode or is reset, the content of the flash memory is copied in RAM for both microcontrollers which start executing it.

  • For each microcontroller, the top-level sequencer enters a never-ending loop and

    • calls in sequence Binary 1 then Binary 2 for one iteration

    • calls the safety library in charge of performing verification.

    • If the verification fails, the board enters panic mode, deactivates its outputs and enters an infinite loop doing nothing.

4.1.3 Return of Experience

The platform has been presented and experimented at the occasion of several hands-on sessions organised at university sites in Europe, North and South America. Teaching formal methods is eased as students are able to see their model running in and interacting with the physical world. It was the occasion to demonstrate how formal methods could be used with embedded systems and IoT. Less theoretic profiles may be introduced/educated to more abstract aspects of computation. Clock (synchronous modelling) and combinatorial (asynchronous modelling) exercises were a starting point for specification enrichment and the discovery of the formal proof. However the restricted development framework and the little number of I/Os is seen as a limitation preventing its real adoption in the industry.

4.2 CSSP Core

4.2.1 Description

In order to counterbalance the Return of Experience (REX) collected for the SK\(_0\) and to obtain a hardware generic enough to qualify for most safety critical application fields, the core computer, so-called CS\(_0\) and based on two redundant PIC32 microcontrollers, has been made independent from the power supply and its I/Os. The CS\(_0\) has a smart card format and has to be plugged on a motherboard providing energy and connection to I/Os. A first version of this motherboard (Fig. 8) has been made available beginning of 2020, together with the CS\(_0\), providing 32 digital inputs and 32 digital outputs.

Fig. 8.
figure 8

The CSSP Core CS\(_0\) board plugged on its hosting motherboard.

4.2.2 Development Cycle

The development cycle is similar to the SK\(_0\)’s one:

  • B models are used to develop vital functions (so-called REPLICATED) that will be used to generate two different binaries with two different tool-chains.

  • HEX files are obtained either via direct compilation from B0 models or via C code generation then C compilation.

However several differences offer far more degrees of freedom with the architecture/behaviour of the application:

  • B models may represent both functions executed cyclically by the application sequencer and functions executed during interrupt. Several entry points may be used with the B project.

  • The developer has to define its own sequencing - the way the functions are called in sequence.

  • The developer is able to add any non-vital software and to link it with the rest of the application.

  • Computation between \(\mu C_1\) and \(\mu C_2\) could be asymmetric, one microcontroller doing more/different things than the other one.

Fig. 9.
figure 9

Architecture of a CSSP application, made of C and B code.

Figure 9 shows that a CSSP application is composed of a vital part (left), replicated and modelled with B, and a non-vital part (right), developed in C. Five C functions have to be implemented, including the main loop api_main which has to call regularly the functions named csp_*_task in charge of the safety verification. For example the Clock drift task manages all the time drift related topics. As a general statement a task is always non-blocking and execute fast (less than 200 \(\upmu \)s). The regularity of the calls have to comply with Fig. 10 where the minimum calling frequencies are defined. If not called often enough, the safety tasks will trigger a shutdown of the platform (which impacts the availability of the product but not its safety).

From the no vital C code, replicated code may be activated in sequence. This code may:

  • use replicated vital variables - their contents is checked regularly to be equal among the two replicas.

  • use replicated no checked variables - their contents is not checked, allowing to have different algorithms being executed on the two replicas.

  • use special function registers (SFR) used for hardware configuration (to program pins to be input or output for example).

  • access non replicated variables defined in the non vital C code.

  • call non vital user defined function through a BASIC machineFootnote 5.

Fig. 10.
figure 10

Minimum expected call frequency of the safety tasks.

To enable the cohabitation of vital and non vital software, several software interfaces have been defined:

  • identifiers allow to discriminate vital (replicated) code from non-vital one. For example:

    • nr_v__var1 represents the variable var1 from non-replicated code,

    • r1_f__Foo() and r2_f__Foo() represent the function Foo compiled with respectively the tool-chain1 1 and 2.

  • Exposing non-vital C variable to the B model. During design you may need to forward non-vital data acquired by the non vital layer to the vital software implemented in B. The information can be transmitted to the vital software through two possible means: as a parameter of the replicated function or as a shared variable read by the two replicated functions.

  • several configuration files defining the nature of the variables, constants, functions, memory mapping, etc.

4.3 Services

The CLEARSY Safety Platform offers several vital services:

  • Vital voter of memory payload: if you want to secure an array of memory. For instance if you want to emit a vital message on a network.

  • Vital voter for register: if you want to apply a vital value into a register of the microcontroller.

  • Minimum timing guarantee: if you want to ensure that a minimum duration elapsed since a given event occurs, before triggering an action.

  • Deadline guarantee: if you want to ensure that a given permissive state is not maintained longer than a timeout or if you want to guarantee that a given action is executed more often that a timeout.

Similarly the CLEARSY Safety Platform offers a number of non-vital services:

  • Utility functions for type and endianess conversion

  • Communication layer between the two microcontrollers: if you want to exchange any kind of data (vital or non-vital) between the two processors.

  • Utility functions for controlling the debug LEDs of the Safety Computer Board.

  • Utility functions to retrieve the globally unique serial number: if you want to have a unique number in your design or if you need to send serial number to CLEARSY for troubleshooting.

  • Utility functions to retrieve temperature from the built-in sensor

  • Utility functions for reading or writing into the 1kB EEPROM provided by the Safety Computer Board.

4.4 Safety Case

The CSSP CS\(_0\) is a generic safety computer. The following failure modes are covered by the CSSP built-in mechanisms and thus do not need to be taken into account in the safety case of the user application:

  • Random transient upset in the duplicated variable (i.e. the one prefixed with r_v__ in the user vital software)

  • Permanent or transient modification of specific registers (SFR, CP0)

  • Modification of the flash memory content

  • Damage on the Random Access Memory (RAM)

  • Compilation error on the vital software (the one written in B0)

  • Perturbation on the clock frequency and timing variable

  • Correct software has been uploaded on the target

  • Damage on the Arithmetic and Logic Unit

Fig. 11.
figure 11

Safety related application conditions.

However the CSSP CS\(_0\) is not sufficient per se as it needs a minima I/OFootnote 6 and a power supply, to be provided by a hosting motherboard. The motherboard together with the CSSP CS\(_0\) have to comply to a number of safety conditions, named Safety Related Applications Conditions (SRAC) in order to include the CSSP CS\(_0\) certificate into its safety case.

4.5 Safety Related Application Conditions

The application developed with the CSSP CS\(_0\) shall meet the 23 SRACFootnote 7 in Fig. 11, either related to the software, hardware, system, or to the exploitation conditions. Otherwise the certificate of the CSSP is not valid and none of the properties offered by the CSSP can be considered as granted.

Each SRAC is structured as follows :

  • Statement: official text of the SRAC

  • Verification proposal: Example of how to check if the SRAC is fulfilled or not

  • Associated risk: Explanation of the associated hazard if the SRAC is not fulfilled.

The verification proposal is provided as example or suggestion. The end user and the application team is free of using any means and method to verify and validate that the SRAC are respected. The proposal is only provided for helping the end user to determine how the compliance to the SRACs can be claimed.

5 The Safety Flasher

The safety flasher is a simple applicationFootnote 8, making a bulb to flash - the bulb is alternatively switched on and off. It has been selected as a first example as it is simple to present and to specify, allowing to introduce the different elements of a CSSP development and how they relate, without diving at the same time into the complexity of the application. The chosen safety property is also trivial.

This application is very similar in its nature to the CSSP clock example [4], except that with the CS\(_0\), the specification is not “the output is being updated” but rather, to be sure that the bulb is not on all the time. In other terms, before switching the bulb on, we need to be sure that the bulb has been continuously off during “enough time”. In other terms:

Definition 1

For any t0 before now where the bulb is on, there is t1 not too old (greater than periodMax+periodMin) such that on t1..t1+periodMin the bulb was off.

Fig. 12.
figure 12

The beginning of the handwritten application, never ending, main loop. The first four tasks are called at each iteration. csp_vitalVariables_task is called when the target frequency is reached.

The bulb with permanent switched-off aspect represents the safe default mode, either if the flasher is off or if a failure was detected, leading to a restrictive mode where outputs are deactivated.

In the top-level B component (csp_safetyFlasher), it translates (first refinement) in:

figure a

The api_main function (Fig. 12) shows that inside the infinite loop, several functions (named as csp__task) are called based on elapsed time (but with different delays). These functions are in charge of performing the safety verifications of the board. The calling of these functions in due time is the responsibility of the developerFootnote 9. If these functions (performing the verifications indicated by their name and in relation with the SRACs), are not called (often enough - Fig. 10), the board will enter a restrictive mode where outputs are deactivated.

The B component contains four functions, one is called directly by the application, the three others are called by interrupt:

  • r_f__safetyFlasher_flash: the main logic called cyclically (Fig. 13)

  • r_f__safetyFlasher_testOutput: safety code called in the interrupt function to generate the safety frequency (20 kHZ on \(\mu C_1\) or 12 kHZ on \(\mu C_2\));

  • safetyFlasher_watchdogTimer: called by the watchdog that check that the watchdog deadline for this function has not been reached.

  • r_f__safetyFlasher_getAddressAndValue: used in the interrupt routine that generates the safety frequency to comply with the SRAC “Output watchdog interlocking”, ensuring that if the safety watchdog dies, all outputs are restrictively blocked.

In the rest of the document, we ignore the low-level interrupt operations and focus on the cyclic operation safetyFlasher_flash (Fig. 14).

Fig. 13.
figure 13

The handwritten flasherTask() function called cyclically by the application main loop. It calls in sequence the two replicas of the operation safetyFlasher_flash.

Fig. 14.
figure 14

The replicated function safetyFlasher_flash, as it is manually implemented in B. 3 local variables are declared with the clause VAR. Their type is then provided (unit32_t, BOOL). The first call to the operation vitalClock_checkAccuracy provides a value to the variable lb. Operation calls and tests are always performed in sequence, not at the same time. Operations get_add_uint64 and get_lt_uint64 are used for respectively performing addition and testing order of unsigned 64-bit integer values.

In order to retrieve the current value of the elapsed time since startup the software designer can call the function nr_f_getTime() (Fig. 13) which returns a uint64_t corresponding to a non-vital estimate of the time since startup expressed in 125 \(\upmu \)s step. Before using this value in the B0-implemented vital software, the designer shall check the accuracy of the value provided by the non-vital layer. This check has to be done by calling the function vitalClock_checkAccuracy (clock, accuracy) which returns true only if the clock from the non-vital software is equal to the actual time (now) at a given tolerance (accuracy parameter). The tolerance takes into account temperature, ageing and standard drift of quartz over the lifetime of the product. Predefined operations are used for 64-bit computation: get_add_uint64 for the addition of two 64-bit integers, get_lt_uint64 for their comparison (“lesser than”). As only uint32_t type is available, 64-bit integers are decomposed into two 32-bit (most-significant and less-significant) integers.

The B modelling contains a total of 6 machines, 3 intermediate refinements, and 4 implementations. The csp_safetyFlasher_i implementation contains less than 200 lines. The overall B project is proved automatically with the addition of 13 mathematical rules (UNION, modulo, sequence, intervals) and around 70 proof tactics. The safetyFlasher_main.c file contains 330 lines, 200 related to the application main loop.

6 Conclusion and Perspectives

The CSSP provides a consistent environment to seamlessly develop proven command and control applications. Its objective is to reduce the development and certification efforts by not requiring expert resources to complete and by simplifying the safety demonstration. It has evolved recently to provide a framework more versatile and usable for the development of safety critical applications. This framework makes precise the various verifications to complete on the final systems, including software and hardware ones. Many of these have to be completed manually (they are not covered by a formal model or a formal proof). Similarly, with the possibility to add your own non-vital code to the application, new risks appear to make a mistake and to perturb the behaviour of the device. It is needed to assess in the future how much effort/time is still saved when using it. An on-going project, RAIL-MAPFootnote 10, has started to use it for the demonstration of a Railway Modular Automatic Pilot.