Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Nowadays, the new generations of distributed embedded control systems are more and more sophisticated since they require new forms of properties such as reconfigurability, reusability, agility, adaptability and fault-tolerance. The first three properties are offered by new advanced component-based technologies, whereas the last two properties are ensured by new technical solutions such as multi-agent architectures.

New generations of component-based technologies have recently gained popularity in industrial software engineering since it is possible to reuse already developed and deployed software components from rich libraries. A Control Component is a software unit owning data of the functional scheme of the system. This advantage reduces the time to market and allows minimizations of the design complexity by supporting the system’s software modularity. This chapter deals with run-time automatic reconfigurations of component-based applications by using multi-agent solutions. An agent is assumed to be a software unit allowing the control of the system as well as its environment before applying automatic reconfigurations. The reasons for which reconfigurations may be taken are classified into two categories [33]: (1) corrective reasons: if there is one component which is misbehaving, then it is automatically substituted by a new one which is assumed to run correctly. The new component is supposed to have the same functionalities as the old one. (2) Adaptive reasons: even the component-based application is running well, dynamic adaptations may be needed as a response to the new environment evolutions, in order to extend new functionalities or to improve some required functional properties. Dynamic reconfigurations can cover the following issues: (1) architecture level which means the set of components to be loaded in memory to constitute the implemented solution of the assumed system; (2) control level which means the compositions of components; (3) implementation level which means the behavior of components encoded by algorithms; and (4) data level which means the global values. We define a multi-agent architecture for reconfigurable embedded control systems where a Reconfiguration Agent is affected to each device of the execution environment to apply automatic reconfigurations of local components, and a Coordination Agent which is used for coordination between distributed Reconfiguration Agents in order to allow coherent distributed reconfigurations. The Coordination Agent is based on a coordination protocol using coordination matrices which define coherent simultaneous reconfigurations of distributed devices. We propose useful meta-models for Control Components and also for intelligent agents. These meta-models are used to implement adaptive embedded control systems. As we choose to apply dynamic scenarios, the system should run even during automatic reconfigurations, while preserving correct executions of functional tasks.

Given that Control Components are defined in general to run sequentially, this feature is inconvenient for real-time applications which typically handle several inputs and outputs in a too short time constraint. To meet performance and timing requirements, a real-time must be designed for concurrency. To do so, we define at the operational level some sequential program units called real-time tasks. Thus, we define a real-time task as a set of Control Components having some real-time constraints. We characterize a task by a set of properties independently from any Real Time Operating System (RTOS). We define service processes as software processes for tasks to provide system’s functionalities, and define reconfiguration processes as tasks to apply reconfiguration scenarios at run-time. In fact, service processes are functional tasks of components to be reconfigured by reconfiguration processes. To guarantee a correct and safety behavior of the system, we use semaphores to ensure the synchronization between processes. We apply the famous algorithm of synchronization between reader and writer processes such that executing a service is considered as a reader and reconfiguring a component is assumed to be a writer process. The proposed algorithm ensures that many service processes can be simultaneously executed, whereas reconfiguration processes must have exclusive access. We study in particular the scheduling of tasks through a Real Time Operating System. We apply the priority ceiling protocol proposed by Sha et al. [49] to avoid the problem of priority inversion as well as the deadlock between the different tasks. The priority ceiling protocol supposes that each semaphore is assigned a priority ceiling which is equal to the highest priority task using this semaphore. Any task is only allowed to enter its critical section if its assigned priority is higher than the priority ceilings of all semaphores currently locked by other tasks.

In this chapter, we continue our research by proposing an original implementation of this agent-based architecture. We assume that agent controls the plant to ensure the system running physically. The design and the implementation of such agent under Real-Time constraints are the scope of this study. The main contributions of this chapter are the following: (1) a complete study of Safety Reconfigurable Embedded Control Systems from the functional level (i.e. dynamic reconfiguration system with a multi-agent system) to the operational level (i.e. decomposition of the system into a set of tasks with time constraints); (2) a global definition of real-time task with its necessary parameters independently from any real-time operating system; (3) the scheduling of these real-time tasks considered as periodic tasks with precedence and mutual exclusion constraints. To our best of knowledge, there is no research works which deal with these different points together.

We present in Sect. 2 the state of art about dynamic reconfiguration. Section 3 presents the benchmark production systems FESTO and EnAS that we follow as running examples in the chapter. We define in Sect. 4 a multi-agent architecture and the communication protocol to ensure safety in a distributed embedded control systems. Section 5 presents the real-time task model and studies the safety of its dynamic reconfiguration as well as the scheduling between the different tasks. We finally conclude the chapter in Sect. 6.

2 Dynamic Reconfiguration

The new generation of industrial control systems is addressing today new criteria as flexibility and agility [43, 48]. We distinguish two reconfiguration policies: static and dynamic policies such that static reconfigurations are applied off-line to apply changes before any system cold start [3], whereas dynamic reconfigurations are dynamically applied at run-time. Two cases exist in the last policy: manual reconfigurations applied by users [47] and automatic reconfigurations applied by intelligent agents [2]. We are interested in automatic reconfigurations of an agent-based embedded control system when hardware or software faults occur at run-time. The system is implemented by different complex networks of Control Components. In literature, there are various studies about dynamic reconfigurations applied to component-based applications. Each study has its strength and its weakness. In the article [35], the authors propose to block all nodes involved in transactions (considered as sets of interactions between components) to realize dynamic reconfigurations. This study has influenced many research works later. Any reconfiguration should respect the consistency propriety which is defined as sets of logical constraints. A major disadvantage of this approach is the necessity to stop all components involved in a transaction. In the article [4], problem of dynamic reconfigurations in CORBA is treated. The authors consider that consistency is related to Remote Procedure Call Integrity. To ensure this property, they propose to block the incoming before the outgoing links. However, the connection between components must be acyclic in order to be able to block connections in the right order. A dynamic reconfiguration language based on features [41] is proposed. The authors use the control language MANIFOLD where processes are considered as black boxes having ports of communication. In this case, the communication is anonymous. The processes having access to shared data are connected in cyclic manners to wait tokens that visit each one at turn (as in token ring). Although the novelty of this solution, there is a loss of time especially at waiting until receiving the token to access to the shared data or also to reconfigure the system. Another study [46] is proposed to apply dynamic updates on graphical components (for example button, graphical interface, \(\ldots \)) in a .Net framework. To do so, the authors associate for each graphical component an appropriate running thread. The synchronization is ensured through the reader-writer-locks. The dynamic reconfiguration is based on blocking all involved connections. Due to rw-locks, this solution works only on local applications. In addition, they define [45] a new reconfiguration algorithm ReDAC (Reconfiguration of Distributed Application with Cyclic dependencies) ensuring dynamic reconfigurations in distributed systems to be based on running multi-threads. This algorithm is applied to capsules which are defined as groups of running components. As disadvantage, the proposed algorithm uses counter variables to count on-going method calls for threads which lead to consume further space memory and treatment time.

To our best of knowledge, there is no research works which treat the problem of dynamic software reconfigurations of component-based technology with semaphores. The novelty of this chapter is the study of dynamic reconfiguration with semaphore ensuring the following points: (1) blocking connections without blocking involved components; (2) safety and correctness of the proposed solution; (3) independence of any specific language; (4) verification of consistency (i.e. logical constraints) delegated to the software agent; (5) suitable for large-scale applications.

3 Benchmark Production Systems: FESTO and EnAS

We present two Benchmark Production SystemsFootnote 1: FESTO and EnAS available in the research laboratory at the Martin Luther University in Germany.

3.1 The FESTO System

The FESTO Benchmark Production System is a well-documented demonstrator used by many universities for research and education purposes, and it is used as a running example in the context of this chapter. FESTO is composed of three units: Distribution, Test and Processing units. The Distribution unit is composed of a pneumatic feeder and a converter to forward cylindrical work pieces from a stack to the testing unit which is composed of the detector, the tester and the elevator. This unit performs checks on work pieces for height, material type and color. Work pieces that successfully pass this check are forwarded to the rotating disk of the Processing unit, where the drilling of the work piece is performed. We assume in this research work two drilling machines \(Drill\_machine1\) and \(Drill\_machine2\) to drill pieces. The result of the drilling operation is next checked by the checking machine and the work piece is forwarded to another mechanical unit. In this research chapter, three production modes of FESTO are considered according to the rate of input pieces denoted by \(number\_pieces\) into the system (i.e. ejected by the feeder).

  • Case 1: High production. If \(number\_pieces \ge Constant1\), then the two drilling machines are used at the same time in order to accelerate the production. In this case, the Distribution and the Testing units have to forward two successive pieces to the rotating disc before starting the drilling with \(Drill\_machine1\) AND \(Drill\_machine2\). For this production mode, the periodicity of input pieces is \(p = 11~{\text {s}}\).

  • Case 2: Medium production. If \(Constant2 \le number\_pieces <\) \(Constant1\), then we use \(Drill\_machine1\) OR \(Drill\_machine2\) to drill work pieces. For this production mode, the periodicity of input pieces is \(p = 30~{\text {s}}\).

  • Case 3: Light production. If \(number\_pieces < Constant2\), then only the drilling machine \(Drill\_machine1\) is used. For this production mode, the periodicity of input pieces is \(p = 50~{\text {s}}\).

On the other hand, if one of the drilling machines is broken at run-time, then we have to only use the other one. In this case, we reduce the periodicity of input pieces to \(p = 40~{\text {s}}\). The system is completely stopped in the worst case if the two drilling machines are broken.

3.2 The EnAS System

The Benchmark Production System EnAS was designed as a prototype to demonstrate energy-antarcic actuator/sensor systems. For the sale of this contribution, we assume that it has the following behavior: it transports pieces from the production system (i.e. FESTO system) into storing units. The pieces in EnAS shall be placed inside tins to close with caps afterwards. Two different production strategies can be applied: we place in each tin one or two pieces according to production rates of pieces, tins and caps. We denote respectively by \(nb_{pieces}, nb_{tins+caps}\) the production number of pieces and tins (as well as caps) per hour and by Threshold a variable (defined in user requirements) to choose the adequate production strategy. The EnAS system is mainly composed of a belt, two Jack stations (\(J_{1}\) and \(J_{2}\)) and two Gripper stations (\(G_{1}\) and \(G_{2}\)). The Jack stations place new produced pieces and close tins with caps, whereas the Gripper stations remove charged tins from the belt into storing units. Initially, the belt moves a particular pallet containing a tin and a cap into the first Jack station \(J_{1}\).

According to production parameters, we distinguish two cases,

  • First production policy: If (\(nb_{pieces} / nb_{tins+caps} \le Threshold)\), then the Jack station \(J_{1}\) places from the production station a new piece and closes the tin with the cap. In this case, the Gripper station \(G_{1}\) removes the tin from the belt into the storing station \(St_{1}\).

  • Second production policy: If (\(nb_{pieces} / nb_{tins+caps} > Threshold\)), then the Jack station \(J_{1}\) places just a piece in the tin which is moved thereafter into the second Jack station to place a second new piece. Once \(J_{2}\) closes the tin with a cap, the belt moves the pallet into the Gripper station \(G_{2}\) to remove the tin (with two pieces) into the second storing station \(St_{2}\).

4 Multi-agent System

We define a multi-agent architecture for distributed safety systems. Each reconfiguration agent is affected in this architecture to a device of the execution environment to ensure Functional Safety. Nevertheless, the coordination between agents in this distributed architecture is inevitable because any individual decision may affect the performance of the others. To guarantee safe distributed reconfigurations, we define the concept of Coordination Matrix that defines correct reconfiguration scenarios to be applied simultaneously in distributed devices and we define the concept of Coordination Agent that handles coordination matrices to coordinate between distributed agents. We propose a communication protocol between agents to manage concurrent distributed reconfiguration scenarios.

Table 1 The agent characteristics

The communication protocol between agents respects the different following points: (1) The Reconfiguration agents control the plant constituted by several physical processes. (2) At the beginning, all the Reconfiguration agents are assigned a specific reconfiguration. (3) The Reconfiguration agent controlling the system can not apply more than one reconfiguration at any time. (4) The Reconfiguration agent decides to apply a new reconfiguration if some conditions are verified. (5) The Reconfiguration may be applied in a local system (in this case, only the associated Reconfiguration agent is concerned) or in a distributed system (in this case, many Reconfiguration agents have to coordinate together to put the whole system in a safe state). (6) The Reconfiguration agent does not know if the other agents will cooperate to put the system into safe state. (7) At the reception of a reconfiguration request, the agent chooses one action from the available possibilities (accept or refuse). The Reconfiguration agent may refuse the request if it is not possible to apply this new reconfiguration. (8) An agent is called cooperative if it always accepts the reconfiguration request. An agent is called selfish if it always refuses the new reconfiguration.

Before introducing the communication protocol, we begin with presenting a Reconfiguration Agent as well as the coordination agent. To resume the characteristics of each one, the Table 1 presents the main information.

4.1 Software Architecture of Reconfiguration Agents

We propose an agent-based architecture to control embedded systems at run-time. The agent checks the environment’s evolution and reacts when new events occur by adding, removing or updating Control Components of the system. To describe the dynamic behavior of an intelligent agent that dynamically controls the plant, we use nested state machines in which states correspond to finite state machines. A finite state machine can be defined as a state machine whose states, inputs and outputs are enumerated. The nested state machine is represented as the following:

$$\begin{aligned} \text {NSM} = (SM_1, SM_2, \ldots , SM_n) \end{aligned}$$

Each state machine (\(SM_i\)) is a graph of states and transitions. A state machine treats the several events that may occur by detecting them and responding to each one appropriately. We define a state machine as the following:

$$\begin{aligned} SM_i = (S_i, S_{i0},I_i, O_i, Pre{\text {-}}cond_i, Post{\text {-}}cond_i, t_i) \end{aligned}$$
  • \(S_i = \{s_{i1}, \ldots , s_{ip}\} \): the states;

  • \(S_{i0}\) the initial state;

  • \( I_i = \{I_{i1}, \ldots , I_{im}\} \): the input events;

  • \(O_i = \{O_{i1}, \ldots , O_{ik}\} \): the ouput events;

  • \(Pre{\text {-}}cond_i\) : the set of conditions to be verified before the activation of a state;

  • \(Post{\text {-}}cond_i\): the set of conditions to be verified once a state is activated;

  • \(t_i:S_i \times I_i \rightarrow S_i \): the transition function.

Fig. 1
figure 1

The Meta-model nested state machine

We propose a conceptual model for a nested state machine in Fig. 1 where we define the classes Nested State Machine, State machine, State, Transition, Event and Condition. The Nested State Machine class contains a certain number of State machine classes. This relation is represented by a composition. The Transition class is double linked to the State class because a transition is considered as an association between two states. Each transition has an event that is considered as a trigger to fire it and a set of conditions to be verified. This association between the Transition class and Event and Condition classes exists and is modeled by the aggregation relation.

We propose a generic architecture for intelligent agents depicted in Fig. 2. This architecture consists of the following parts: (1) the Event Queue to save different input events that may take place in the system, (2) the intelligent software agent that reads an input event from the Event Queue and reacts as soon as possible, (3) the set of state machines such that each one is composed of a set of states, (4) each state represents a specific information about the system. The agent, based on nested state machines, determines the new system’s state to execute according to event inputs and also conditions to be satisfied. This solution has the following characteristics: (1) The control agent design is general enough to cope with various kinds of embedded-software based-component application. Therefore, the agent is uncoupled from the application and from its Control Components. (2) The agent is independent of nested state machines: it permits to change the structure of nested state machines (add state machines, change connections, change input events, and so on) without having to change the implementation of the agent. This ensures that the agent continues to work correctly even in case of modification of state machines. (3) The agent is not supposed to know components that it has to add or remove in a reconfiguration case.

Fig. 2
figure 2

The internal agent behavior

In the following algorithm, the symbol \(Q\) is an event queue which holds incoming event instances, \(ev\) refers to an event input, \(S_i\) represents a State Machine, and \(s_{i,j}\) a state related to a State Machine \(S_i\). The internal behavior of the agent is defined as follow:

  1. 1.

    the agent reads the first event \(ev\) from the queue \(Q\);

  2. 2.

    searches from the top to the bottom in the different state machines;

  3. 3.

    within the state machine \(SM_i\), the agent verifies if \(ev\) is considered as an event input to the current state \(s_{i,j}\) (i.e. ev \(\in \) I related to \(s_{i,j}\)). In this case, the agent searches the states considered as successor for the state \(s_{i,j}\) (states in the same state machine \(SM_i\) or in another state machine \(SM_l\));

  4. 4.

    the agent executes the operations related to the different states;

  5. 5.

    repeats the same steps (1–4) until no more event exists in the queue to be treated.

figure a

First of all, the agent evaluates the pre-condition of the state \(s_{i,j}\). If it is false, then the agent exits, Else the agent determines the list of Control Components concerned by this reconfiguration, before applies the required reconfiguration for each one. Finally, it evaluates the post-condition of the state \(s_{i,j}\) and generates errors whenever it is false.

figure b

4.2 Communication Protocol

To guarantee safe distributed reconfigurations, we define the concept of Coordination Matrix that defines correct reconfiguration scenarios to be applied simultaneously in distributed devices and we define the concept of Coordination Agent that handles coordination matrices to coordinate between distributed agents.

Let \(Sys\) be a distributed safe system of \(n\) devices, and let \(Ag_{1},\ldots , Ag_{n}\) be \(n\) agents to handle automatic distributed reconfiguration scenarios of these devices. We denote in the following by \(Reconfiguration^{a}_{i_{a},j_{a},k_{a},h_{a}}\) a reconfiguration scenario applied by \(Ag_{a}\) (\(a \in [1,n]\)) as follows: (1) the corresponding \(ASM\) state machine is in the state \(ASM_{i_{a}}\). Let \(cond^{a}_{i_{a}}\) be the set of conditions to reach this state, (2) the \(CSM\) state machine is in the state \(CSM_{i_{a},j_{a}}\). Let \(cond^{a}_{j_{a}}\) be the set of conditions to reach this state, (3) the \(DSM\) state machine is in the state \(DSM_{k_{a},h_{a}}\). Let \(cond^{a}_{k_{a},h_{a}}\) be the set of conditions to reach this state. To handle coherent distributed reconfigurations that guarantee safe behaviors of the whole system \(Sys\), we define the concept of Coordination Matrix of size (n, 4) that defines coherent scenarios to be simultaneously applied by different agents. Let \(CM\) be such a matrix that we characterize as follows: each line \(a\ (a \in [1,n])\) corresponds to a reconfiguration scenario

\(Reconfiguration^{a}_{i_{a},j_{a},k_{a},h_{a}}\) to be applied by \(Ag_{a}\) as follows:

$$\begin{aligned} CM[a,1] = i_{a};\quad CM[a,2] = j_{a};\quad CM[a,3] = k_{a};\quad CM[a,4] = h_{a} \end{aligned}$$

According to this definition: If an agent \(Ag_{a}\) applies the reconfiguration scenario \(Reconfiguration^{a}\) \(_{CM[a,1],CM[a,2],CM[a,3],CM[a,4]}\), Then each other agent \(Ag_{b}\) (\(b \in [1,n] \backslash \{a\}\)) has to apply the scenario \(Reconfiguration^{b}\) \(_{CM[b,1],CM[b,2],CM[b,3],CM[b,4]}\) (Fig. 3). We denote in the following by idle agent each agent \(Ag_{b}\) (\(b \in [1,n]\)) which is not required to apply any reconfiguration when others perform scenarios defined in \(CM\). In this case:

$$\begin{aligned} CM[b,1]&= CM[b,2] = CM[b,3] = CM[b,4] = 0 \\ cond^{a}_{CM[a,1]}&= cond^{a}_{CM[a,2]} = cond^{a}_{CM[a,3],CM[a,4]} = True \end{aligned}$$
Fig. 3
figure 3

The coordination matrix

We propose a communication protocol between agents to manage concurrent distributed reconfiguration scenarios. We guarantee a coherent behavior of the whole distributed system by defining a Coordination Agent (denoted by \(CA(\xi (Sys))\)) which handles the Coordination Matrices of \(\xi (Sys)\) to control the rest of agents (i.e. \(Ag_{a}\), \(a \in [1,n]\)) as follows:

  • When a particular agent \(Ag_{a}\) (\(a \in [1,n]\)) should apply a reconfiguration scenario \(Reconfiguration^{a}_{i_{a},j_{a},k_{a},h_{a}}\) (i.e. under well-defined conditions), it sends the following request to \(CA(\xi (Sys))\) to obtain its authorization:

    $$\begin{aligned} request(Ag_{a},CA(\xi (Sys)),Reconfiguration^{a}_{i_{a},j_{a},k_{a},h_{a}}). \end{aligned}$$
  • When \(CA(\xi (Sys))\) receives this request that corresponds to a particular coordination matrix \(CM \in \xi (Sys)\) and if \(CM\) has the highest priority between all matrices of \(Concur(CM) \cup \{CM\}\), then \(CA(\xi (Sys))\) informs the agents that have simultaneously to react with \(Ag_{a}\) as defined in \(CM\). The following information is sent from \(CA(\xi (Sys))\): For each \(Ag_{b}, b \in [1,n] \setminus \{a\}\) and \(CM[b,i] \ne 0, \forall i \in [1,4]\!: reconfiguration(CA(\xi (Sys)),Ag_{b}, Reconfiguration^{b}_{CM[b,1],CM[b,2],CM[b,3],CM[b,4]})\)

  • According to well-defined conditions in the device of each \(Ag_{b}\), the \(CA(\xi (Sys))\) request can be accepted or refused by sending one of the following answers:

    • If \(cond^{b}_{i_{b}} = cond^{b}_{j_{b}} = cond^{b}_{k_{b},h_{b}} = \) True then the following reply is sent from \(Ag_{b}\) to \(CA(\xi (Sys))\!\!: possible\_reconfig(Ag_{b},CA(\xi (Sys)), Reconfiguration^{b}_{CM[b,1],CM[b,2],CM[b,3],CM[b,4]})\).

    • Else the following reply is sent from \(Ag_{b}\) to \(CA(\xi (Sys))\): \(not\_possible\_reconfig\) \((Ag_{b},CA(\xi (Sys)),Reconfiguration^{b}_{CM[b,1],CM[b,2],CM[b,3],CM[b,4]})\).

  • If \(CA(\xi (Sys))\) receives positive answers from all agents, then it authorizes reconfigurations in the concerned devices: For each \(Ag_{b}\), \(b \in [1,n]\) and \(CM[b,i] \ne 0\), \(\forall i \in [1,4]\), apply (\(Reconfiguration^{b}_{CM[b,1],CM[b,2],CM[b,3],CM[b,4]})\) in \(device_{b}\). Else If \(CA(\xi (Sys))\) receives a negative answer from a particular agent, then

    • If the reconfiguration scenario \(Reconfiguration^{a}_{i_{a},j_{a},k_{a},h_{a}}\) allows optimizations of the whole system behavior, then \(CA(\xi (Sys))\) refuses the request of \(Ag_{a}\) by sending the following reply: \(refused\_reconfiguration(CA(\xi (Sys)),Ag_{a},\) \( Reconfiguration^{a}_{CM[a,1],CM[a,2],CM[a,3],CM[a,4]}))\).

Fig. 4
figure 4

The communication scenario

When a Reconfiguration Agent (denoted by \(RA_i\)) needs to apply a new reconfiguration, it sends a request to the Coordination Agent. The Coordination Agent asks all the known Reconfiguration Agents (denoted by \(RA_j\), \( \forall j \in [1..NbR], j <> i\) where \(NbR\) represents the number of Reconfiguration Agents) if it is possible to apply the new reconfiguration introduced as parameter. The Reconfiguration Agent (\(RA_j\)) studies this proposition and sends its response which may be accept or refuse the new reconfiguration (depending on its related state). Whenever the Coordination Agent receives positive responses from all the Reconfiguration Agents (\(RA_j \forall j \in [1..NbR], j <> i\)) (i.e. the number of positives answers is equal to \(NbR - 1\)), then it decides to apply the new reconfiguration for all Reconfiguration Agents \(RA_j\) (\( \forall j \in [1..NbR]\)) by sending a confirmation message. Whenever the Coordination Agent receives only one negative response from a Reconfiguration Agents (\(RA_j, j \in [1..NbR], j <> i\)), it decides to cancel this reconfiguration and informs the corresponding agent by its decision (i.e \(RA_i\)). Figure 4 depicts the interaction between Reconfiguration and Coordination agents to ensure dynamic reconfiguration in a distributed system.

Fig. 5
figure 5

The agent-based control in a distributed system

Before sending or receiving a message, the Reconfiguration Agent searches the Coordination Agent with the method \(searchCoordinator()\). The Coordination Agent in its turn searches also the list of Reconfiguration Agents with the method \(searchAgents()\).

The method \(receive()\) used by both Coordination Agent and Reconfiguration Agent permits to receive a message sent by another agent. Whenever \(receive()\) is invoked through \(CA\_Communicate()\) and \(RA\_Communicate()\) methods, if the agent does not receive a message, it is blocked (but without blocking the other activities of the same agent).

A message is defined by the following data: (1) \(content\): the subject of the message (such as the reconfiguration to be applied); (2) \(performative\): the performative indicates what the sender wants to achieve (for example ACCEPT, REFUSE, CANCEL, CONFIRM); (3) \(time\): it is necessary to treat messages ordered by time; (4) \(sender\): the agent emitting the message; and (5) \(receiver\): the agent receiving the message. Figure 5 depicts the different classes such as \(Control Agent\), \(Coordination Agent\), \(Reconfiguration Agent\), \(Message\) and \(System\).

In the following, we present the Communicate method defined for both Reconfiguration and Coordination Agent. The \(CA\_Communicate\) method defined for the Coordination Agent has as variables: (1) \(i\) representing the reconfiguration agent which initiates the request of reconfiguration; (2) \(j\) which corresponds to the reconfiguration agent receiving the request of reconfiguration from the coordination agent; (3) \(NbR\) which represents the total number of reconfiguration agents; (4) \(NbResp\) considered as the current number of responses approving the new reconfiguration by the reconfiguration agents; (5) \(matrix\) representing the new matrix to be applied if all the reconfiguration agents accept.

figure c
figure d

The \(RA\_Communicate\) method defines the Reconfiguration Agent behavior as follows: (1) whenever the Reconfiguration Agent receives a request to apply a new reconfiguration by the Coordination Agent, it evaluates this proposition and decides whether to accept or to refuse it. The Reconfiguration Agent sends its response. (2) whenever the Reconfiguration Agent receives a confirmation to apply the new reconfiguration from the Coordination Agent, then it applies it.

figure e
figure f

We developed a complete tool “ProtocolReconf”, to verify the communication protocol. The tool “ProtocolReconf” offers the possibility to create the Reconfiguration and Coordination Agents by introducing the necessary parameters. It is required to define the different scenarios that the Reconfiguration Agent can support so that when a modification occurs in the system, it should look for the convenient reconfiguration. For the Coordination Agent, it is necessary to define the set of Coordination Matrices to apply to the whole system [21].

5 Real-Time Task: Definition, Dynamic Reconfiguration and Scheduling

In this section, we present a Real-Time Task as a general concept independently from any real-time operating system, its dynamic reconfiguration, the scheduling between several tasks and the implementation in a specific real-time operating system (which is RTLinux).

5.1 Real Time Task Definition

A real time task is considered as a process (or a thread depending on the Operating System) having its own data (such as registers, stack, \(\ldots \)) which is in competition with other tasks to have the processor execution. A task is handled by a Real-Time Operating System (RTOS) which is a system satisfying explicitly response-time constraints by supporting a scheduling method that guarantees response time especially to critical tasks.

In this paragraph, we aim to present a real-time task as a general concept independently from any real-time operating system.

To be independent from any Real-Time Operating System and to be related to our research work, we define a task \(\tau _i\) as a sequence of Control Components, where a Control Component is ready when its preceding Control Component completes its execution. \(\tau _{i,j}\) denotes the j-th Control Component of \(\tau _i\) (Fig. 6). Thus, our application consists of a set of periodic tasks \(\tau = (\tau _1, \tau _2, \ldots , \tau _n\)). All the tasks are considered as periodic this is not a limitation since non-periodic task can be handled by introducing a periodic server.

Fig. 6
figure 6

Real time task

Running Example. In the FESTO Benchmark Production System, the tasks \(\tau _1\) to \(\tau _9\) execute the following functions:

  • (\(\tau _1\)) Feeder pushes out cylinder and moves backward/back;

  • (\(\tau _2\)) Converter pneumatic sucker moves right/left;

  • (\(\tau _3\)) Detection Module detects workpiece, height, color and material;

  • (\(\tau _4\)) Shift out cylinder moves backward/forward;

  • (\(\tau _5\)) Elevator elevating cylinder moves down/up;

  • (\(\tau _6\)) Rotating disc workpiece present in position and rotary indexing table has finished a 90 rotation;

  • (\(\tau _7\)) Driller 1 machine drills workpiece;

  • (\(\tau _8\)) Driller 2 machine drills workpiece;

  • (\(\tau _9\)) WarehouseCylinder removes piece from table.

In the following paragraphs, we introduce the meta-model of a task. We study also the dynamic reconfiguration of tasks. After that, we introduce the task scheduling. Finally, we present the task implementation within RTLinux as a Real-Time Operating System.

5.2 A Meta-model Task

In this chapter, we extend the work presented in [42] by studying both a task and a scheduler in a general real-time operating system where each task is characterized by:

  • identifer: each task \(\tau _i\) has a name and an identifier.

  • temporal properties: each task \(\tau _i\) is described by a deadline \(D_i\) (which corresponds to the maximal delay allowed between the release and the completion of any instance of the task), a period \(T_i\), a worst-case execution time \(C_i\). It is released every \(T_i\) seconds and must be able to consume at most \(C_i\) seconds of CPU time before reaching its deadline \(D_i\) seconds after release (\(C_i \le D_i \le T_i\)). We assume that these attributes are known, and given as constants (Table 2).

  • constraints: resources specification \(\rho _i\), precedence constraints and/or QoS properties to be verified.

  • state: A Real-Time Operating System implements a finite state machine for each task and ensures its transition. The state of a task may be in one of the following possible states Ready, Running, Blocked or Terminated. Every task is in one of a few different states at any given time:

    Ready The task is ready to run but waits for allocation of the processor. The scheduler decides which ready task will be executed next based on priority criterion (i.e. the task having the highest priority will be assigned to the processor).

    Blocked A task cannot continue execution because it has to wait (there are many reasons such that waiting for event, waiting on semaphore or a simple delay).

    Running In the running state, the processor is assigned to the task, so that its instructions can be executed. Only one task can be in this state at any time, while all the other tasks can be simultaneously in other states.

    Terminated When a task terminates its execution, the task allocator deletes it and releases the resources taken by this task (Fig. 7).

  • priority: each task is assigned a priority value which may be used in the scheduling.

Table 2 A task set example
Fig. 7
figure 7

Task states

$$\begin{aligned} \tau _i = (D_i; C_i ;T_i;I_i; O_i; \rho _i ; (CC_i^1, \ldots , CC_i^{n_i})); \end{aligned}$$
  • a deadline \(D_i\);

  • an execution time \(C_i\);

  • a period \(T_i\);

  • a set of inputs \(I_i\);

  • a set of outputs \(O_i\);

  • a set of constraints \(\rho _i\);

  • a set of \(n_i\) Control Components (\(n_i \ge 1\)) such that the task \(\tau _i\) is constituted by \(CC_i^1\), \(CC_i^2\), ..., \(CC_i^{n_i}\).

One of the core components of an RTOS is the task scheduler which aims to determine which of the ready tasks should be executing. If there are no ready tasks at a given time, then no task can be executed, and the system remains idle until a task becomes ready (Fig. 8).

Fig. 8
figure 8

Scheduling task

Running Example. In the FESTO Benchmark Production System, when the task \(\tau _1\) is created, it is automatically marked as Ready task. At the instant \(t1\),it is executed by the processor (i.e. it is in the Running state). When the task \(\tau _1\) needs a resource at the instant \(t2\), it becomes blocked. Whenever the resource is available at the instant \(t3\), the task \(\tau _1\) is transformed into ready state. Finally, it is executed again since the time \(t4\). It is terminated at the instant \(t5\) (Fig. 9).

Fig. 9
figure 9

The variation of states related to the task \(\tau _1\)

A scheduler related to a real-time operating system is characterized by (Fig. 10):

  • readyTask: a queue maintaining the set of tasks in ready state.

  • executingTask: a queue maintaining the set of tasks in executing state.

  • minPriority: the minimum priority assigned to a task.

  • maxPriority: the maximum priority assigned to a task.

  • timeSlice: the threshold of preempting a task (the quantity of time assigned to a task before its preemption).

Several tasks may be in the ready or blocked states. The system therefore maintains a queue of blocked tasks and another queue for ready tasks. The latter is maintained in a priority order, keeping the task with the highest priority at the top of the list.When a task that has been in the ready state is allocated the processor, it makes a state transition from ready state to running state. This assignment of the processor is called dispatching and it is executed by the dispatcher which is a part of the scheduler.

Fig. 10
figure 10

The real time operating system

Fig. 11
figure 11

The context switch between tasks

Running Example. In the FESTO Benchmark Production System, we consider three tasks \(\tau _1\), \(\tau _2\) and \(\tau _3\). having as priority \(p1\), \(p2\) and \(p3\) such that \(p1 < p2 < p3\). We suppose that the task \(\tau _1\) is running when the task \(\tau _2\) is created at the instant t1. As a consequence, there is a context switch so that the task \(\tau _1\) stays in a ready state and the other task \(\tau _2\) begins its execution as it has higher priority. At the instant \(t2\), the task \(\tau _3\) which was already blocked waiting a resource, gets the resource. As the task \(\tau _3\) is the highest priority, the task \(\tau _2\) turns into ready state and \(\tau _3\) executes its routine. The task \(\tau _3\) continues processing until it has completed, the scheduler enables \(\tau _2\) to become running (Fig. 11).

5.3 Feasible and Safety Dynamic Reconfiguration of Tasks

We want to study the system’s safety during reconfiguration scenarios. In fact, we want to keep tasks running while dynamically reconfiguring them. We assume for such system’s task several software processes which provide functional services, and assume also reconfiguration processes that apply required modifications such as adapting connections, data or internal behaviors of the component. The execution of these different tasks is usually critical and can lead to incorrect behaviors of the whole system. In this case, we should schedule which process should be firstly activated to avoid any conflict between processes. Consequently, we propose in this section to synchronize processes for coherent dynamic reconfigurations applied to several tasks.

5.3.1 Reconfiguration and Service Processes

We want in this section to synchronize service and reconfiguration processes of a task according to the following constraints: (1) whenever a reconfiguration process is running, any new service process must wait until its termination; (2) a reconfiguration process must wait until the termination of all service processes before it begins its execution; (3) it is not possible to execute many reconfiguration processes in parallel; (4) several service processes can be executed at the same time. To do that, we use semaphores and also the famous synchronization algorithm between readers and writer processes such that executing a service plays the role of a reader process and reconfiguring a task plays the role of a writer process. In the following algorithm, we define serv and reconfig as semaphores to be initialized to 1. The shared variable Nb represents the number of current service processes associated to a specific task. Before the execution of a service related to a task, the service process increments the number Nb (which represents the number of service processes). It tests if it is the first process (i.e. Nb is equal to one). In this case, the operation P(reconfig) ensures that it is not possible to begin the execution if there is a reconfiguration process.

  • P(serv)

  • \(\text {Nb}\leftarrow \text {NB} + 1\)

  • if (NB = 1) then

    • P(reconfig)

  • end if

  • V(serv)

After the execution of a service related to a task, the corresponding process decrements the number \(Nb\) and tests if there is no service process (i.e. \(Nb\) is equal to zero). In this case, the operation \(V(reconfig)\) authorizes the execution of a reconfiguration process.

  • P(serv)

  • \(\text {Nb}\leftarrow \text {NB} - 1\)

  • if (NB = 0) then

    • V(reconfig)

  • end if

  • V(serv)

Consequently, each service process related to a task does the following instructions:

figure g

Running Example. Let us take as a running example the task \(Test\) related to the EnAS system. To test a piece before elevating it, this component permits to launch the Test Service Process. Figure 12 displays the interaction between the objects Test Service Process,Service semaphore and Reconfiguration semaphore. The flow of events from the point of view of Test Service Process is the following: (1) the operation \(P(serv)\) leads to enter in critical section for Service semaphore; (2) the number of services is incremented by one; (3) if it is the first service, then the operation \(P(reconfig)\) permits to enter in critical section for Reconfiguration semaphore; (4) the operation \(V(serv)\) leads to exit from critical section for Service semaphore; (5) the Test Service Process executes the corresponding service; (6) before modifying the number of service, the operation \(P(serv)\) leads to enter in critical section for Service semaphore; (7) the number of services is decremented by one; (8) if there is no service processes, then the operation \(V(reconfig)\) permits to exit from critical section for Reconfiguration semaphore; (9) the operation \(V(serv)\) leads to liberate Service semaphore from its critical section.

Fig. 12
figure 12

The service process scenario

With the operation \(P(reconfig)\), a reconfiguration process verifies that there is no reconfiguration processes nor service processes which are running at the same time. After that, the reconfiguration process executes the necessary steps and runs the operation \(V(reconfig)\) in order to push other processes to begin their execution. Each reconfiguration process specific to a task realizes the following instructions:

figure h
Fig. 13
figure 13

The reconfiguration process scenario

Running Example. Let us take as example the task \(Elevate\) related to EnAS system. The agent needs to reconfigure this task which permits to launch the Elevate Reconfiguration Process. Figure 13 displays the interaction between the following objects Elevate Reconfiguration Process and Reconfiguration semaphore. The flow of events from the point of view of Elevate Reconfiguration Process is the following: (1) the operation \(P(reconfig)\) leads to enter in critical section for Reconfiguration semaphore; (2) the Elevate Reconfiguration Process executes the corresponding reconfiguration; (3) the operation \(V(reconfig)\) leads to liberate Reconfiguration semaphore from its critical section.

5.3.2 Verification of Safety of the Synchronization

To verify the safety of the synchronization, we should verify if the different constraints mentioned above are respected.

First property: whenever a reconfiguration process is running, any service processes must wait until the termination of the reconfiguration. Let us suppose that there is a reconfiguration process (so the integer \(reconfig\) is equal to zero and the number of current services is zero). When a service related to this component is called, the number of current services is incremented (i.e. it is equal to 1) therefore the operation \(P(reconfig)\) leads the process to be in a blocked state (as the integer \(reconfig\) is equal to zero). When the reconfiguration process terminates the reconfiguration, the operation \(V(reconfig)\) permits to liberate the first process waiting in the semaphore queue. In conclusion, this property is validated.

Second property: whenever a service process is running, any reconfiguration processes must wait until the termination of the service. Let us suppose that there is a service process related to a component (so the number of services is greater or equal to one which means that the operation \(P(reconfig)\) is executed and \(reconfig\) is equal to zero). When a reconfiguration is applied, the operation \(P(reconfig)\) leads this process to be in a blocked state (as the \(reconfig\) is equal to zero). Whenever the number of service processes becomes equal to zero, the operation \(V(reconfig)\) allows to liberate the first reconfiguration process waiting in the semaphore queue. As a conclusion, this property is verified.

Third property: whenever a reconfiguration process is running, it is not possible to apply a new reconfiguration process until the termination of the first one. Let us suppose that a reconfiguration process is running (so \(reconfig\) is equal to zero). Whenever, a new reconfiguration process tries to execute, the operation \(P(reconfig)\) puts it into a waiting state. After the reconfiguration process which is running is terminated, the operation \(V(reconfig)\) allows to liberate the first reconfiguration waiting process. Consequently, this property is respected.

Fourth property: whenever a service process is running, it is possible to apply another process service. Let us suppose that a service process \(P1\) is running. Whenever, a new service process \(P2\) tries to begin the execution, the state of \(P2\) (activated or blocked) depends basically on the process \(P1\):

  • if \(P1\) is testing the shared data \(Nb\), then the operation \(P(serv)\) by the process \(P2\) leads it to a blocking state. When the process \(P1\) terminates the test of the shared data \(Nb\), the operation \(V(serv)\) allows to launch the process waiting in the semaphore’s queue.

  • if \(P1\) is executing its service, then the operation \(P(serv)\) by the process \(P2\) allows to execute normally.

Thus, this property is validated.

5.4 Task Scheduling with Priority Ceiling Protocol

How to schedule periodic tasks with precedence and mutual exclusion constraints is considered as important as how to represent a task in a general real-time operating system. In our context, we choose the priority-driven preemptive scheduling used in the most real-time operating systems. The semaphore solution can lead to the problem of priority inversion which consists that a high priority task can be blocked by a lower priority task. To avoid such problem, we propose to apply the priority inheritance protocol proposed by Sha et al. [49].

The priority inheritance protocol can be used to schedule a set of periodic tasks having exclusive access to common resources protected by semaphores. To do so, each semaphore is assigned a priority ceiling which is equal to the highest priority task using this semaphore. A task \(\tau _i\) is allowed to enter its critical section only if its assigned priority is higher than the priority ceilings of all semaphores currently locked by tasks other than \(\tau _i\).

Schedulability test for the priority ceiling protocol: a set of n periodic tasks using the priority ceiling protocol can be scheduled by the rate-monotonic algorithm if the following inequalities hold, \(\forall i, 1 \le i \le n,\)

$$\begin{aligned} C_1 / T_1 + C_2 / T_2 + \cdots + C_i / T_i + B_i / T_i \le i(2^{1 / i}-1) \end{aligned}$$

where \(B_i\) denotes the worst-case blocking time of a task \(\tau _i\) by lower priority tasks.

Fig. 14
figure 14

The priority ceiling protocol applied to three tasks R1, S1 and S2

Table 3 The event and its corresponding action in Fig. 14

Running Example. In the FESTO Benchmark Production System, we consider three tasks \(R1\) (a reconfiguration task), \(S1\) and \(S2\) (service tasks) having as priority \(p1\), \(p2\) and \(p3\) such that \(p1 > p2 > p3\). The sequence of processing steps for each task is as defined in the section previous paragraph where S (resp. R) denotes the service (resp. reconfiguration) semaphore:

$$\begin{aligned} R1&= \{ \ldots P(R)\,execute\,reconfiguration\,V(R) \ldots \} \\ S1&= \{ \ldots P(S) \ldots P(R) \ldots V(S)\,execute\,service\,P(S) \ldots V(R) \ldots V(S) \ldots \} \\ S2&= \{ \ldots P(S) \ldots P(R) \ldots V(S)\,execute\,service\,P(S) \ldots V(R) \ldots V(S) \ldots \} \end{aligned}$$

Therefore, the priority ceiling of the semaphore R is equal to the task R1 (because the semaphore R is used by the tasks R1, S1 and S2 and we know that the task R1 is the highest priority) and the priority ceiling of the semaphore S is equal to the task S1 (because the semaphore S is used by the tasks S1 and S2 and the priority task of S1 is higher). We suppose that the task \(S2\) is running when the task \(S1\) is created at the instant t3. We suppose also that the task \(R1\) is created at the instant t5. Fig. 14 , a line in a high level indicates that the task is executing, a line in a low level indicates that the the task is blocked or preempted by another task. Table 3 explains more in details the example.

6 Conclusion

This chapter deals with Safety Reconfigurable Embedded Control Systems. We propose conceptual models for the whole component-based architecture. We define a multi-agent architecture where a Reconfiguration Agent is affected to each device of the execution environment to handle local automatic reconfigurations, and a Coordination Agent is defined to guarantee safe distributed reconfigurations. To deploy a Control Component in a Real-Time Operating System, we define the concept of real-time task in general (especially its characteristics). The dynamic reconfiguration of tasks is ensured through a synchronization between service and reconfiguration processes to be applied. We propose to use the semaphore concept for this synchronization such that we consider service processes as readers and reconfiguration processes as writers. We propose to use the priority ceiling protocol as a method to ensure the scheduling between periodic tasks with precedence and mutual exclusion constraints. The main contributions presented through this work are: the study of Safety Reconfigurable Embedded Control Systems from the functional to the operational level and the definition of a real-time task independently from any real-time operating system as well as the scheduling of these real-time tasks considered as periodic tasks with precedence and mutual exclusion constraints. The chapter’s contribution is applied to two Benchmark Production Systems FESTO and EnAS available at Martin Luther University in Germany.