1 Introduction

Cyber-Physical Systems (CPS) combine computational and physical entities that interact with sophisticated and unpredictable environments via sensors and actuators. To cost-efficiently study their behavior, engineers typically apply model-based development methodologies, which combine modeling and simulation activities with prototyping. The successful development of CPS is thus strongly dependent on the quality and correctness of their models.

CPS models can be extremely complex: they may include hundreds of variables, signals, look-up tables and components, combining continuous and discrete dynamics. Verification and testing activities are thus of critical importance to early detect problems in the models [2, 5, 7, 16, 17], before they propagate to the actual CPS. Discovering faults is however only a part of the problem. Due to their complexity, debugging CPS models by identifying the causes of failures can be as challenging as identifying problems themselves [15].

CPS functionalities are often modelled using the MathWorks\(^\mathrm{TM}\) Simulink environment where falsification-based testing can be used to find bugs in Simulink/Stateflow models [2, 22, 25]. This approach is based on quantifying (by monitoring [4]) how much a simulated trace of CPS behavior is close to violate a requirement expressed in a formal specification language, such as Signal Temporal Logic (STL) [20]. This measure enables a systematic exploration of the input space searching for the first input sequence responsible for a violation. However, this method does not provide any suitable information about which components should be inspected to resolve the violation. Trace diagnostics [10] identifies (small) segments of the observable model behavior that are sufficient to imply the violation of the formula, thus providing a failure explanation at the input/output model interface level. However, this is a black-box technique that does not attempt to delve into the model and explain the failure in terms of its internal signals and components.

In this paper, we advance the knowledge in failure analysis of CPS models by presenting CPSDebug, a debugging technique that combines testing, specification mining, and failure analysis to identify the causes of failures. CPSDebug first exercises the CPS model under analysis by running the available test cases, while discriminating passing and failing executions using requirements formalized as a set of STL formulas. While running the test cases, CPSDebug records information about the internal behavior of the CPS model, In particular, it collects the values of all internal system variables at every timestamp. The values collected from passing test cases are used to infer properties about the variables and components involved in the computations. These properties capture the correct behavior of the system.

CPSDebug checks the mined properties against the traces collected from failed test cases to discover the internal variables, and its corresponding components, that are responsible for the violation of the requirements. Finally, failure evidence is analyzed using trace diagnostics [10] and clustering [12] to produce a time-ordered sequence of snapshots that show where the anomalous variables values originated and how they propagated within the system.

CPSDebug thus overcomes the limitation of state of the art approaches that do not guide engineers in the analysis, but only indicate the inputs or code locations that might be responsible for the failure. The sequence of snapshots returned by CPSDebug provides a step by step illustration of the failure with explicit indication of the faulty behaviors. We evaluated CPSDebug against three classes of faults and two actual CPS models. Results suggest that CPSDebug can effectively and efficiently assist developers in their debugging tasks. The feedback that we collected from industry engineers further confirmed that the output produced by CPSDebug can be indeed valuable to ease failure analysis and debugging of CPS models.

The rest of the paper is organized as follows. We provide background information in Sect. 2 and we describe the case study in Sect. 3. In Sect. 4 we present our approach for failure explanation while in Sect. 5 we provide the empirical evaluation. We discuss the related work in Sect. 6 and we draw our conclusions in Sect. 7.

2 Background

2.1 Signals and Signal Temporal Logic

We define \(S= \{s_1,\ldots ,s_n\}\) to be a set of signal variables. A signal or trace w is a function \(\mathbb {T}\rightarrow \mathbb {R}^n\), where \(\mathbb {T}\) is the time domain in the form of \([0,d]\subset \mathbb {R}\). We can also see a multi-dimensional signal w as a vector of real-valued uni-dimensional signals \(w_i : \mathbb {T}\rightarrow \mathbb {R}\) associated with variables \(s_i\) for \(i=1,\ldots ,n\). We assume that every signal \(w_i\) is piecewise-linear. Given two signals \(u : \mathbb {T}\rightarrow \mathbb {R}^l\) and \(v : \mathbb {T}\rightarrow \mathbb {R}^m\), we define their parallel composition \(u \Vert v : \mathbb {T}\rightarrow \mathbb {R}^{l+m}\) in the expected way. Given a signal \(w : \mathbb {T}\rightarrow \mathbb {R}^n\) defined over the set of variables \(S\) and a subset of variables \(R\subseteq S\), we denote by \(w_{R}\) the projection of w to \(R\), where \(w_{R} = \Vert _{s_i \in R} w_i\).

Let \(\varTheta \) be a set of terms of the form \(f(R)\) where \(R\subseteq S\) are subsets of variables and \(f : \mathbb {R}^{|R|} \rightarrow \mathbb {R}\) are interpreted functions. The syntax of STL with both future and past operators is defined by the grammar:

where f(R) are terms in \(\varTheta \) and I are real intervals with bounds in \(\mathbb Q_{\ge 0} \cup \{\infty \}\). As customary, we use the shorthands for eventually, for always, for once, for historically, for rising edge and for falling edgeFootnote 1. We interpret STL with its classical semantics defined in [19].

2.2 Daikon

Daikon is a template-based property inference tool that, starting from a set of variables and a set of observations, can infer a set of properties that are likely to hold for the input variables [9]. More formally, given a set of variables \(V=V_1,\ldots ,V_n\) defined over the domains \(D_1,\ldots D_n\), an observation for these variables is a tuple \(\overline{v}=(v_1, \ldots , v_n)\), with \(v_i \in D_i\).

Given a set of variables V and multiple observations \(\overline{v}_1 \ldots \overline{v}_m\) for these same variables, Daikon is a function \(D(V,\overline{v}_1 \ldots \overline{v}_m)\) that returns a set of properties \(\{p_1, \ldots p_k\}\), such that \(\overline{v_i} \models p_j \forall i,j\), that is, all the observations satisfy the inferred properties. For example, considering two variables x and y and considering the observations (1, 3), (2, 2), (4, 0) for the tuple (xy), Daikon can infer properties such as \(x>0\), \(x+y=4\), and \(y\ge 0\).

The inference of the properties is driven by a set of template operators that Daikon instantiates over the input variables and checks against the input data. Since template-based inference can generate redundant and implied properties, Daikon automatically detects them and reports the relevant properties only. Finally, to guarantee that the inferred properties are relevant, Daikon computes the probability that the inferred property holds by chance for all the properties. Only properties that are statistically significant with a probability higher than 0.99 are assumed to be reliable and are reported in the output.

In our approach, we use Daikon to automatically generate properties that capture the behavior of the individual components and individual signals in the model under analysis. These properties can be used to precisely detect misbehaviours and their propagation.

3 Case Study

We now introduce a case study that we use as a running example to illustrate our approach step by step. We consider the Aircraft Elevator Control System (AECS) introduced in [11] to illustrate model-based development of a Fault Detection, Isolation and Recovery (FDIR) application for a redundant actuator control system.

Fig. 1.
figure 1

Aircraft elevator control system [11].

Figure 1 shows the architecture of an aircraft elevator control system with redundancy, with one elevator on the left and one on the right side. Each elevator is equipped with two hydraulic actuators. Both actuators can position the elevator, but only one shall be active at any point in time. There are three different hydraulic systems that drive the four actuators. The left (LIO) and right (RIO) outer actuators are controlled by a Primary Flight Control Unit (PFCU1) with a sophisticated input/output control law. If a failure occurs, a less sophisticated Direct-Link (DL/PFCU2) control law with reduced functionality takes over to handle the left (LDL) and right (RDL) inner actuators. The system uses state machines to coordinate the redundancy and assure its continual fail-operational activity.

This model has one input variable, the input Pilot Command, and two output variables, the position of left and right actuators, as measured by the sensors. This is a complex model that could be extremely hard to analyze in case of failure. In fact, the model has 426 signals, from which 361 are internal variables that are instrumented (279 real-valued, 62 Boolean and 20 enumerated - state machine - variables) and any of them, or even a combination of them, might be responsible for an observed failure.

The model comes with a failure injection mechanism, which allows to dynamically insert failures that represent hardware/ageing problems into different components of the system during its simulation. This mechanism allows insertion of (1) low pressure failures for each of the three hydraulic systems, and (2) failures of sensor position components in each of the four actuators. Due to the use of redundancy in the design of the control system, a single failure is not sufficient to alter its intended behavior. In some cases even two failures are not sufficient to produce faulty behaviors. For instance, the control system is able to correctly function when both a left and a right sensor position components simultaneously fail. This challenges the understanding of failures because there are multiple causes that must be identified to explain a single failure.

To present our approach we consider the analysis of a system failure caused by the activation of two failures: the sensor measuring Left Outer Actuator Position failing at time 2 and the sensor measuring Left Inner Actuator Position failing at time 4. To collect evidence of how the system behaves, we executed the Simulink model with 150 test cases with different pilot commands and collected the input-output behavior both with and without the failures.

When the system behaves correctly, the intended position of the aircraft required by the pilot must be achieved within a predetermined time limit and with a certain accuracy. This can be captured with several requirements. One of them says that whenever Pilot Command \(cmd \) goes above a threshold m, the actuator position measured by the sensor must stabilize (become at most n units away from the command signal) within \(T+t\) time units. This requirement is formalized in STL with the following specification:

(1)
Fig. 2.
figure 2

Expected behavior of AECS.

Fig. 3.
figure 3

Failure of the AECS.

Figures 2 and 3 shows the correct and faulty behavior of the system. The control system clearly stops following the reference signal after 4 seconds. The failure observed on the input/output interface of the model does not give any indication within the model on the reason leading to the property violation. In the next section, we present how our failure explanation technique can address this case producing a valuable output for engineers.

4 Failure Explanation

In this section we describe how CPSDebug works with help of the case study introduced in Sect. 3. Figure 4 illustrates the main steps of the workflow. Briefly, the workflow starts from a target CPS model and a test suite with some passing and failing test cases, and produces a failure explanation for each failing test case. The workflow consists of three sequential phases:

Fig. 4.
figure 4

Overview of the failure explanation procedure.

  1. (i)

    Testing, which simulates the instrumented CPS model with the available test cases to collect information about its behavior, both for passing and failing executions,

  2. (ii)

    Mining, which mines properties from the traces produced by passing test cases; intuitively these properties capture the expected behavior of the model,

  3. (iii)

    Explaining, which uses mined properties to analyze the traces produced by failures and generate failure explanations, including information about the root events responsible for the failure and their propagation.

4.1 Testing

CPSDebug starts by instrumenting the CPS model. This is an important pre-processing step that is done before testing the model and that allows to log the internal signals in the model. Model instrumentation is inductively defined on the hierarchical structure of the Simulink/Stateflow model and is performed in a bottom-up fashion. For every signal variable having the real, Boolean or enumeration type, CPSDebug assigns a unique name to it and makes the simulation engine to log its values. Similarly, CPSDebug instruments look-up tables and state machines. Each look-up table is associated with a dedicated variable which is used to produce a simulation trace that reports the unique cell index that is exercised by the input at every point in time. CPSDebug also instruments state-machines by associating two dedicated variables per state-machine, one reporting the transitions taken and one reporting the locations visited during the simulation. We denote by V the set of all instrumented model variables.

The first step of the testing phase, namely Model Simulation, runs the available test cases \(\{w_I^k | 1 \le k \le n\}\) against the instrumented version of the simulation model under analysis. The number of available test cases may vary case by case, for instance in our case study the test suite included \(n=150\) tests.

The result of the model simulation consists of one simulation trace \(w^k\) for each test case \(w_I^k\). The trace \(w^k\) stores the sequence of (simulation time, value) pairs \(w^k_v\) for every instrumented variable \(v \in V\) collected during simulation.

To determine the nature of each trace, we transform the informal model specification, which is typically provided in form of free text, into an STL formula \(\varphi \) that can be automatically evaluated by a monitor. In fact, CPSDebug checks every trace \(w^k\) against the STL formula \(\varphi \), \(1 \le k \le n\) and labels the trace with a pass verdict if \(w^k\) satisfies \(\varphi \), or a fail verdict otherwise. In our case study, the STL formula 1 in Sect. 3 labeled 149 traces as passing and 1 trace as failing.

4.2 Mining

In the mining phase, CPSDebug selects the traces labeled with a pass verdict and exploits them for property mining.

Prior to the property inference, CPSDebug performs several intermediate steps that facilitate the mining task. First, CPSDebug reduces the set of variables V to its subset \(\hat{V}\) of significant variables by using cross-correlation. Intuitively, the presence of two highly correlated variables implies that one variable adds little information on top of the other one, and thus the analysis may actually focus on one variable only. The approach initializes \(\hat{V}=V\) and then checks the cross-correlation coefficient between all the logged variables computed on the data obtained from the pass traces. The cross-correlation coefficient \(P(v_1,v_2)\) between two variables \(v_1\) and \(v_2\) is computed with the Pearson method, i.e. \(P(v_1,v_2) = \frac{cov(v_1,v_2)}{\sigma _{v_{1}}\sigma _{v_{2}}}\) which is defined in terms of the covariance of \(v_1\) and \(v_2\) and their standard deviations. Whenever the cross-correlation coefficient between two variables is higher than 0.99, that is \(P(v_1, v_2)>0.99\), CPSDebug removes one of the two variables (and its associated traces) from further analysis, that is, \(\hat{V}=\hat{V} \setminus v_1\). In our case study, \(|V| = 361\) and \(|\hat{V}| = 121\), resulting in a reduction of 240 variables.

In the next step, CPSDebug associates each variable \(v \in \hat{V}\) to (1) its domain D and (2) its parent Simulink-block B. We denote by \(V_{D,B} \subseteq \hat{V}\) the set \(\{v_{1}, \ldots , v_{n} \}\) of variables with the domain D associated with block B. CPSDebug collects all observations \(\overline{v}_1 \ldots \overline{v}_n\) from all samples in all traces associated with variables in \(V_{D,B}\) and uses the Daikon function \(D(V_{D,B}, \overline{v}_1 \ldots \overline{v}_n)\) to infer a set of properties \(\{p_{1}, \ldots , p_{k}\}\) related to the block B and the domain D. Running property mining per model block and model domain allows to avoid (1) combinatorial explosion of learned properties and (2) learning properties between incompatible domains.

Finally, CPSDebug collects all the learned properties from all the blocks and the domains, and translates them to an STL specification, where each Daikon property p is transformed to an STL assertion of type .

In our case study, Daikon returned 96 behavioral properties involving 121 variables, hence CPSDebug generated an STL property \(\psi \) with 96 temporal assertions, i.e., \(\psi = [\psi _1 \, \psi _2 \, ... \, \psi _{96}]\). Equations 2 and 3 shows two examples of behavioral properties inferred from our case study by Daikon and translated to STL. Variables \(mode \), \(LI\_pos\_fail \) and \(LO\_pos\_fail \) denote internal signals Mode, Left Inner Position Failure and Left Outer Position Failure from the aircraft position control Simulink model. The first property states that the Mode signal is always in the state 2 (Passive) or 3 (Standby), while the second property states that the Left Inner Position Failure is encoded the same than the Left Outer Position Failure.

(2)
(3)

4.3 Explaining

This phase analyzes a trace w collected from a failing execution and produces a failure explanation. The Monitoring step analyzes the trace against the mined properties and returns the signals that violate the properties and the time intervals in which the properties are violated. CPSDebug subsequently labels with F (fail) the internal signals involved in the violated properties and with P (pass) the remaining signals from the trace. To each fail-annotated signal, CPSDebug also assigns the violation time intervals of the corresponding violated properties returned by the monitoring tool.

Table 1. Internal signals that violate at least one learned invariant and Simulink blocks to which they belong. The column \(\tau (s)\) denotes the first time that each signal participates in an invariant violation.

In our case study, the analysis of the left inner and the left outer sensor failure resulted in the violation of 17 mined properties involving 19 internal signals.

For each internal signal there can be several fail-annotated signal instances, each one with a different violation time interval. CPSDebug selects the instance that occurs first in time, ignoring all other instances. This is because, to reach the root cause of a failure, CPSDebug has to focus on the events that cause observable misbehaviours first.

Table 1 summarizes the set of property-violating signals, the block they belong to, and the instant of time the signal has first violated a property for our case study. We can observe that the 17 signals participating in the violation of at least one mined property belong to only 5 different Simulink blocks. In addition, we can see that all the violations naturally cluster around two time instants – 2 s and 4 s. This suggests that CPSDebug can effectively isolate in space and time a limited number of events likely responsible for the failure.

Fig. 5.
figure 5

Failure explanation as a sequence of snapshots - part of the first snapshot.

The Clustering & Mapping step then (1) clusters the resulting fail-annotated signal instances by their violation time intervals and (2) maps them to the corresponding model blocks, i.e., to the model blocks that have some of the fail-annotated signal instances as internal signals.

Finally, CPSDebug generates failure explanations that capture how the fault originated and propagated in space and time. In particular, the failure explanation is a sequence of snapshots of the system, one for each cluster of property violations. Each snapshot reports (1) the mean time as approximative time when the violations represented in the cluster occurred, (2) the model blocks \(\{B_1,...,B_p\}\) that originate the violations reported in the cluster, (3) the properties violated by the cluster, representing the reason why the cluster of anomalies exist, and (4) the internal signals that participate to the violations of the properties associated with the cluster. Intuitively a snapshot represents a new relevant state of the system, and the sequence shows how the execution progresses from the violation of set of properties to the final violation of the specification. The engineer is supposed to exploit the sequence of snapshots to understand the failure, and the first snapshot to localize the root cause of the problem. Figure 5 shows the first snapshot of the failure explanation that CPSDebug generated for the case study. We can see that the explanation of the failure at time 2 involves the Sensors block, and propagates to Signal conditioning and failures and Controller blocks. By opening the Sensors block, we can immediately see that something is wrong with the sensor that measures the left inner position of the actuator. Going one level below, we can see that the signal \(s_{252}\) produced by \(LI\_pos\_fail \) is suspicious – indeed the fault was injected exactly in that block at time 2. It is not a surprise that the malfunctioning of the sensor measuring the left inner position of the actuator affects the Signal conditioning and failures block (the block that detects if there is a sensor that fails) and the Controller block. However, at time 2 the failure in one sensor does not affect yet the correctness of the overall system, hence the STL specification is not yet violated. The second snapshot (not shown here) generated by CPSDebug reveals that the sensor measuring the left outer position of the actuator fails at time 4. The redundancy mechanism is not able to cope with multiple sensor faults, hence anomalies manifest in the observable behavior. From this sequence of snapshots, the engineer can conclude that the problem is in the failure of the two sensors - one measuring the left inner and the other measuring the left outer position of the actuator that stop functioning at times 2 and 4, respectively.

5 Empirical Evaluation

We empirically evaluated our approach against three classes of faults: multiple hardware faults in fault-tolerant systems, which is the case of multiple components that incrementally fail in a system designed to tolerate multiple malfunctioning units; incorrect look-up tables, which is the case of look-up tables containing incorrect values; and erroneous guard conditions, which is the case of imprecise conditions in the transitions that determine the state-based behavior of the system. Note that these classes of faults are highly heterogenous. In fact, their analysis requires a technique flexible enough to deal with multiple failure causes, but also with the internal structure of complex data structures and finally with state-based models.

We consider two different systems to introduce faults belonging to these three classes. We use the fault-tolerant aircraft elevator control system [11] presented in Sect. 3 to study the capability of our approach to identify failures caused by multiple overlapping faults. In particular, we study cases obtained by (1) injecting a low pressure fault into two out of three hydraulic components (fault \(h_{1}h_{2}\)), and (2) inserting a fault in the left inner and left outer sensor position components (fault \(lilo \)).

We use the automatic transmission control system [13] to study the other classes of faults. Automatic transmission control system is composed of 51 variables, includes 4 look-up tables of size between 4 and 110 and two finite state machines running in parallel with 3 and 4 states, respectively, as well as 6 transitions each. We used the 7 STL specifications defined in [13] to reveal failures in this system. We studied cases obtained by (1) modifying a transition guard in the StateFlow chart (fault \(guard \)), and (2) altering an entry in the look-up table Engine (fault \(eng\_lt \)).

To study these faults, we considered two use scenarios. For the aicraft elevator control system, we executed 150 test cases in which we systematically changed the amplitude and the frequency of the pilot command steps. These tests were executed on a non-faulty model. We then executed an additional test on the model to which we dynamically injected \(h_{1}h_{2}\) and \(lilo \) faults. For the automatic transmission control system, we executed 100 tests in which we systematically changed the step input of the throttle by varying the amplitude, the offset and the absolute time of the step. All the tests were executed on a faulty model. In both cases, we divided the failed tests from the passing tests. CPSDebug used the data collected from the passing tests to infer models necessary for the analysis of the failed tests.

We evaluated the output produced by our approach considering four main aspects: Scope Reduction, Cause Detection, Quality of the Analysis, and Computation Time. Scope Reduction measures how well our approach narrows down the number of elements to be inspected to a small number of anomalous signals that require the attention of the engineer, in comparison to the set of variables involved in the failed execution. Cause detection indicates if the first cluster of anomalous values reported by our approach includes any property violation caused by the signal that is directly affected by the fault. Intuitively, it would be highly desirable that the first cluster of anomalies reported by our technique includes violations caused by the root cause of the failure. For instance, if a fault directly affects the values of the signal Right Inner Pos., we expect these values to cause a violation of a property about this same signal. We qualitatively discuss the set of violated properties reported for the various faults and explain why they offer a comprehensive view about the problem that caused the failure. Finally, we analyze the computation time of CPSDebug and its components and compare it to the simulation time of the model.

To further confirm the effectiveness of our approach, we contacted 3 engineers from (1) an automotive OEM with over 300.000 employees (E1), (2) a major modeling and simulation tool vendor with more than 3.000 employees (E2), and (3) an SME that develops tools for verification and testing of CPS models (E3). We asked them to evaluate the outcomes of our tool for a selection of faults (it was infeasible to ask them to inspect all the results we collected). In particular, we sent them the faulty program, an explanation of both the program and the fault, and the output generated by our tool, and we asked them to answer the following questions:

  1. Q1

    How helpful is the output to understand the cause(s) of the failure? (Very useful/Somewhat useful/Useless/Misleading)

  2. Q2

    Would you consider experimenting our tool with your projects? (Yes/May be/No)

  3. Q3

    Considering the sets of violations that have been reported, is there anything that should be removed from the output? (open question)

  4. Q4

    Is there anything more you would like to see in the output produced by our tool? (open question)

In the following, we report the results that we obtained for each of the analyzed aspects.

5.1 Scope Reduction, Cause Detection and Qualitative Analysis

Table 2 shows the degree of reduction achieved for the analyzed faults. Column system indicates the faulty application used in the evaluation. Column # vars indicates the size of the model in terms of the number of its variables. Column fault indicates the specific fault analyzed. Column # \(\psi \) gives the number of learned invariants. Column # suspicious vars (reduction) indicates the number of variables involved in the violated properties and the reduction achieved. Column fault detected indicates whether the explanation included a variable associated with the output of the block in which the fault was injected.

Table 2. Scope reduction and cause detection.

We can see from Table 2 that CPSDebug successfully detected the exact origin of the fault in 3 out of 4 cases. In the case of the aircraft elevator control system, CPSDebug clearly identifies the problem with the respective sensors (fault \(lilo \)) and hydraulic components (fault \(h_1 h_2\)). Overall, the scope reduction ranged from \(90\%\) to \(98\%\) of the model signals, allowing engineers to focus on a small subset of the suspicious signals. Note that a strong scope reduction is useful also when CPSDebug is not effective with a fault, since engineers could quickly conclude that the fault is not in the (few) recommended locations without wasting their time (such as for the guard fault).

In the case of the automatic transmission control, CPSDebug associates the misbehavior of the model with the Engine look-up table and points to its right entry. The scope reduction in this case is \(90\%\). On the other hand, CPSDebug misses the exact origin of the \(guard \) fault and fails to point to the altered transition. This happens because the faulty guard alters only the timing but not the qualitative behavior of the state machine. Since Daikon is able to learn only invariant properties, CPSDebug is not able to discriminate between passing and failing tests in that case. Nevertheless, CPSDebug does associate the entire state machine to the anomalous behavior, since the observable signal that violates the STL specification is generated by the state machine.

5.2 Computation Time

Table 3 summarizes computation time of CPSDebug applied to the two case studies. We can make two main conclusions from these experimental results: (1) the overall computation time of CPSDebug-specific activities is comparable to the overall simulation time and (2) property mining dominates by far the computation of the explanation. We finally report in the last row the translation of the Simulink simulation traces recorded in the Common Separated Values (csv) format to the specific input format that is used by Daikon. In our prototype implementation of CPSDebug, we use an inefficient format translation that results in excessive time. We believe that investing an additional effort can result in improving the translation time by several orders of magnitude.

Table 3. CPSDebug computation time.

5.3 Evaluation by Professional Engineers

We analyze in this section the feedback provided by engineers E1–E3 to the questions Q1–Q4.

  1. Q1

    E1 found CPSDebug potentially very useful. E2 and E3 found CPSDebug somewhat useful.

  2. Q2

    All engineers said that they would experiment with CPSDebug.

  3. Q3

    None of the engineers found anything that should be removed from the tool outcome.

  4. Q4

    E2 and E3 wished to see better visual highlighting of suspicious signals. E2 wished to see the actual trace for each suspicious signal. E2 and E3 wished a clearer presentation of cause-effect relations.

Apart from the direct responses to \(Q1-4\), we received other useful information. All engineers shared appreciation for the visual presentation of outcomes, and especially the marking of suspicious Simulink blocks in red. E1 highlighted that real production models typically do not only contain Simulink and StateFlow blocks, but also SimEvent and SimScape blocks, Bus Objects, Model Reference, Variant Subsystems, etc., which may limit the applicability of the current prototype implementation.

Overall, engineers confirmed that CPSDebug can be a useful technology. At the same time, they offered valuable feedback to improve it, especially the presentation of the output produced by the tool.

6 Related Work

The analysis of software failures has been addressed with two main classes of related approaches: fault localization and failure explanation techniques.

Fault localization techniques aim at identifying the location of the faults that caused one or more observed failures (an extensive survey can be found in [27]). A popular example is spectrum-based fault-localization (SBFL) [1], an efficient statistical technique that, by measuring the code coverage in the failed and successful tests, can rank the program components (e.g., the statements) that are most likely responsible for a fault.

SBFL has been recently employed to localize faults in Simulink/Stateflow CPS models [5, 7, 16,17,18], showing similar accuracy as in the application to software systems [18]. The explanatory power of this approach is however limited, because it generates neither information that can help the engineers understanding if a selected code location is really faulty nor information about how a fault propagated across components resulting on an actual failure. Furthermore, SBFL is agnostic to the nature of the oracle requiring to know only whether the system passes or not a specific test case. This prevents the exploitation of any additional information concerning why and when the oracle decides that the test is not conformed with respect to the desired behavior. In Bartocci et al. [5] the authors try to overcome this limitation by assuming that the oracle is a monitor generated from an STL specification. This approach allows the use of the trace diagnostic method proposed in Ferrère et al. [10] to obtain more information (e.g., the time interval when the cause of violation first occurs) about the failed tests improving the fault-localization. Although this additional knowledge can improve the confidence on the localization, still little is known about the root cause of the problem and its impact on the runtime behavior of the CPS model.

CPSDebug complements and improves SBFL techniques generating information that helps engineers identifying the cause of failures, understanding how faults resulted in chains of anomalous events that eventually led to the observed failures, and producing a corpus of information well-suited to support engineers in their debugging tasks, as confirmed by the subjects who responded to our questionnaire.

Failure explanation techniques analyze software failures in the attempt of producing information about failures and their causes. For instance, a few approaches combined mining and dynamic analysis in the context of component-based and object-oriented applications to reveal [24] and explain failures [3, 6, 21]. These approaches are not however straightforwardly applicable to CPS models, since they exploit the discrete nature of component-based and object-oriented applications that is radically different from the data-flow oriented nature of CPS models, which include mixed-analog signals, hybrid (continuous and discrete) components, and a complex dynamics.

CPSDebug originally addresses failure explanation in the context of CPS models. The closest work to CPSDebug is probably Hynger [14, 23], which exploits invariant generation to detect specification mismatches, that is, a mismatch between an actual and an inferred specification, in Simulink models. Specification mismatches can indicate the presence of problems in the models. Differently from Hynger, CPSDebug does not compare specifications but exploits inferred properties to identify anomalous behaviors in observed failures. Moreover, CPSDebug exploits correlation and clustering techniques to maintain the output compact, and to generate a sequence of snapshots that helps comprehensively defining the story of the failure. Our results show that this output can be the basis for cost-effective debugging.

A related body of research consists of approaches for anomaly detection of Cyber-Physical Systems [8, 26]. However, anomaly detection approaches aim at detecting misbehaviours, rather than analyzing failures and detecting their root causes as CPSDebug does.

7 Future Work and Conclusions

We have presented CPSDebug, an automatic approach for explaining failures in Simulink models. Our approach combines testing, specification mining and failure analysis to provide a concise explanation consisting of time-ordered sequence of model snapshots that show the variable exhibiting anomalous behavior and their propagation in the model. We evaluated the effectiveness CPSDebug on two models, involving two use scenarios and several classes of faults.

We believe that this paper opens several research directions. In this work, we only considered mining of invariant specifications. However, we have observed that invariant properties are not sufficient to explain timing issues, hence we plan to experiment in future work with mining of real-time temporal specifications. In particular, we will study the trade-off between the finer characterization of the model that temporal specification mining can provide and its computational cost. We also plan to study systematic ways to explain failures in presence of heterogeneous components. In this paper, we consider the setting in which we have multiple passing tests, but we only use a single fail test to explain the failure. We will study whether the presence of multiple failing tests can be used to improve the explanations. In this work, we have performed manual fault injection and our focus was on studying the effectiveness of CPSDebug on providing meaningful failure explanations for different use scenarios and classes of faults. We plan in the future to develop automatic fault injection and perform systematic experiments for evaluating how often CPSDebug is able to find the root cause.