Interaction-Based Offline Runtime Verification of Distributed Systems

Mahe, Erwan; Bannour, Boutheina; Gaston, Christophe; Lapitre, Arnault; Gall, Pascale Le

doi:10.1007/978-3-031-42441-0_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14155 ))

Included in the following conference series:

International Conference on Fundamentals of Software Engineering

122 Accesses

Abstract

Interactions are formal models describing asynchronous communications within a distributed system. They can be drawn in the fashion of sequence diagrams and associated with an operational semantics in the style of process algebras. In this paper, we propose an algorithm for offline runtime verification against interactions. Our algorithm deals with observability issues e.g. that some subsystems may not be observed or that some events may not be observed when the end of monitoring on different subsystems cannot be synchronized. We prove the algorithm’s correctness and assess the performance of an implementation.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Monitoring Distributed Component-Based Systems

Better Late Than Never or: Verifying Asynchronous Components at Runtime

Challenges in Fault-Tolerant Distributed Runtime Verification

Keywords

1 Introduction

Context. Distributed Systems (DS) have been identified in the recent survey [26] as one of the most challenging application domains for Runtime Verification (RV). An important bottleneck is that the formal references against which system executions are analyzed are specified using formalisms or logics usually equipped with trace semantics. Indeed, because DS are composed of subsystems deployed on different computers and communicating via message passing, their executions are more naturally represented as collections of traces observed at the level of the different subsystems’ interfaces rather than as single global traces [7, 24]. Those collections can be gathered using a distributed observation architecture involving several local observation devices, each one dedicated to a subsystem, and deployed on the same computer as the subsystem it is dedicated to. An approach to confront such collections of local execution traces to formal references with a trace semantics might consist in identifying the global traces that result from all possible temporal orderings of the events occurring in the local traces. If none of those global traces conforms to the formal reference, then we might conclude that an error is observed [24]. However, the absence of a global clock implies that, in all generality, it is not possible to synchronize the endings of the different local observation processes. Therefore, in the process of reconstructing global traces, some events might be missing in local traces. Such problems occur whenever, for technical or legal reasons, it is not possible to observe some subsystems or else the observation has been interrupted too early.

Contributions. In this paper, we propose a Runtime Verification approach dedicated to DS with an emphasis on overcoming issues of partial observability, whether due to the absence of a global clock, or to the impossibility of observing some subsystem executions. Our approach belongs to the family of offline RV techniques in which traces are logged prior to their analysis. As for formal references, we inherit the framework of interaction models from earlier works [18, 20]. Interactions describe actor-oriented scenarios and can be represented graphically in the fashion of UML Sequence Diagrams (UML-SD) [25] or Message Sequence Charts (MSC) [14]. We designed in [18] an algorithm to decide whether or not a collection of local traces is accepted by an interaction. However, this algorithm cannot cope with partial observability. The core contribution of this paper is then to define an algorithm to tackle those limitations, i.e. to deal with collections of local traces with missing or incomplete ones. Theorem 1 will enable us to relate collections of local traces reflecting partially observed executions to those of the original reference interaction. The key operator in our algorithm is a removal operator (Definition 5) discarding parts of the interaction relative to unobserved subsystems. We prove the correctness of our algorithm and argue how the use of the removal operations allows us to solve partial observability (Theorem 2). All proofs are available in [19]. Finally, we present some experiments using an implementation of our algorithm, given as an extension of the HIBOU tool [17].

Related Work. Solutions to the oracle problem (offline RV) for DS using local logs often rely on a preliminary reordering of events using either timestamps [24] or some happened-before relations (of Lamport [15]) [7, 16, 23]. In [9, 11, 12] such solutions rely on a set of discrete and local behavioral models. DS behaviors are modeled by Input/Output Transition Systems (IOTS) [11, 12] or by Communicating Sequential Processes (CSP) [9] and local observations are intertwined to associate them with global traces that can be analyzed w.r.t. models. Those approaches however require to synchronize local observations, based on the states in which each of the logging processes terminates (e.g., based on quiescence states in [11], termination/deadlocks in [9] or pre-specified synchronization points in [12]). The works [8, 10, 13, 24] focus on verifying distributed executions against models of interaction (while [10, 13] concern MSC, [24] considers choreographic languages, [8] session types, and [4] trace expressions). [10, 24] propose offline RV that relies on synchronization hypotheses and on reconstructing a global trace by ordering events occurring at the distributed interfaces (by exploiting the observational power of testers [10] or timestamp information assuming clock synchronization [24]). Our RV approach for multitraces does not require synchronization prerequisites on DS logging. Thus, unlike previous works on offline RV, we can analyze DS executions without needing a synchronization hypothesis on the ending of local observations. For online RV, the work [13] depends on a global component (network sniffer) while the work [8] proposes local RV against projections of interactions satisfying conditions that enforce intended global behaviors. In contrast to these works, we process collections of local logs against interactions. The work [4] focuses on how distributed monitors can be adapted for partial observation. Yet, our notion of partial observation is distinct from that of [4] where messages are exchanged via channels which are associated to an observability likelihood. [4] proposes specification transformations by removing or making optional several identified unobservable events. We instead deal with partial observability from the perspective of analyzing truncated multitraces due to synchronization issues.

Paper Outline. Sect. 2 discusses the nature of DS, their modelling with interactions and the challenge of applying RV to DS. Section 3 defines multitraces, interactions and associated removal operations. Section 4 defines and proves the correctness of our RV algorithm. Section 5 reports experimental results.

2 Preliminaries

Notations. Given a set A, $A^*$ is the set of words on A, with $\varepsilon $ the empty word and the “." concatenation law. For any word $w \in A^*$, |w| is the length of w and any word $w'$ is a prefix of w if there exists a word $w''$, possibly empty, such that $w = w'.w''$. Let us note $\overline{w}$ the set of prefixes of a word $w \in A^*$ and $\overline{W}$ the set of prefixes of all words of a set $W \subseteq A^*$. Given a set A, |A| designates its cardinal and $\mathcal {P}(A)$ is the set of all subsets of A.

Distributed Systems. From a black box perspective, the atomic concept to describe the executions of DS is that of communication actions occurring on a subsystem’s interface. Here a subsystem refers to a software system deployed on a single machine. Anticipating the use of interactions as models in Sect. 3.2, a subsystem interface is called a lifeline and corresponds to an interaction point on which the subsystem can receive or send some messages. Lifelines are elements of a set $\mathcal {L}$ denoting the universe of lifelines. An action occurring on a lifeline is defined by its kind (emission or reception, identified resp. by the symbols ! and ?) and by the message which it carries. We introduce the universe $\mathcal {M}$ of messages. Executions observed on a lifeline l can be modelled as execution traces i.e. sequences of actions. For $l\in \mathcal {L}$, the set $\mathbb {A}_l$ of actions over l is $\{ l \varDelta m ~|~ \varDelta \in \{!,?\}, ~ m \in \mathcal {M} \}$ and the set $\mathbb {T}_l$ of traces over l is $\mathbb {A}_l^*$. For any $a \in \mathbb {A}_l$ of the form l?m or l!m, $\theta (a)$ refers to l.

Figure 1 sketches out an example of DS composed of three remote subsystems, assimilated to their interface , and . This DS implements a simplified publish/subscribe scheme of communications (an alternative to client-server architecture). The publisher may publish messages on the broker which may then forward them to the subscriber if it is already subscribed. Figure 1c depicts an interaction defined between the three lifelines. Each lifeline is depicted by a vertical line labelled by its name at the top. By default, the top to bottom direction represents time passing. A communication action depicted above another one on the same lifeline occurs beforehand. Communication actions are represented by horizontal arrows labelled with the action’s message. Whenever an arrow exits (resp. enters) a lifeline, there is a corresponding emission (resp. reception) action at that point on the line. For example, the horizontal arrow from the lifeline to the lifeline indicates that the subsystem sends the message , denoted as ! , which is then received by the lifeline , denoted as ? . More complex behaviors can be introduced through operators (similar to combined fragments in UML-SD) drawn in the shape of boxes that frame sub-behaviors of interest. For instance, in Fig. 1c, $loop_S$ corresponds to a sequential loop. From the perspective of the lifeline, this implies that it can observe words of the form i.e. it can receive an arbitrary number of instances of the message then one instance of and then it can receive and transmit an arbitrary number of . A representative global trace specified by the interaction in Fig. 1c is (see Fig. 1a):

This trace illustrates that the and lifelines can send their respective messages and in any order since there are no constraints on their ordering. In contrast, the reception of a message necessarily takes place after its emission. Since the reception of the message takes place before that of the message, this last message necessarily corresponds to the one occurring in the bottom loop. The global trace in Fig. 1a is a typical example of a trace accepted by the interaction in Fig. 1c. Indeed, this trace realizes one of the behaviors specified by the interaction which corresponds to: unfolding zero times the first loop; realizing the passing of the message between lifelines and ; unfolding one time the second loop. None of the prefixes of this accepted trace is an accepted trace.

Accepted Multitraces. Following the terminology of [7, 18], we call multitrace a collection of local traces, one per remote subsystem. Figure 1b depicts a multitrace involving 3 local traces: for subsystem , for , and for . It is possible to interleave these local traces to obtain the global trace in Fig. 1a, i.e. the multitrace in Fig. 1b corresponds to the tuple of projections of the global trace in Fig. 1a onto each of the sub-systems. The tuple of projections of a global trace is unique. However, conversely, one might compute several global traces associated to the same tuple of local traces. This is because, in all generality, there is no ordering between actions occurring on different lifelines. For example, from the multitrace of Fig. 1b, one could reconstruct the global trace:

The tuple of projections of this global trace is also the multitrace in Fig. 1b. With the algorithm from [18] one can recognize exactly accepted multitraces (e.g. the one from Fig. 1b), which correspond to projections of accepted global traces (e.g. Figure 1a).

Logging and Partial Observability. Offline RV requires to collect execution traces prior to their analyses. In this process, it might be so that some subsystems cannot be equipped with observation devices. Moreover, due to the absence of synchronization between the local observations, the different logging processes might cease at uncorrelated moments. For example, let us consider the multitrace in Fig. 1d as an observed execution of the system considered in Fig. 1, where, by hypothesis, the subsystem is not observed. Remark that this multitrace corresponds to a partial observation of the multitrace in Fig. 1b. Indeed, each trace corresponding to a given subsystem in Fig. 1d is a prefix of the trace corresponding to the same sub-system in Fig. 1b. Thus, if executions were also observed and with longer observation times for each local observation processes, it may well be that one would have observed the multitrace in Fig. 1b rather than the one in Fig. 1d. when analyzing the multitrace in Fig. 1d against the interaction in Fig. 1c, we need the RV process not to conclude on the occurrence of an error. Hence, we want to be able to recognize multitraces in which each of the local traces can be extended to reconstruct a multitrace accepted by the interaction. Concretely, this means recognizing multi-prefixes of accepted multitraces. Let us remark that a projection of a prefix of an accepted global trace is a multi-prefix of an accepted multitrace. However the reverse is not true. For example, there exists no prefix of a global trace accepted by the interaction in Fig. 1c that projects on the multitrace in Fig. 1d. This is because the emission of by would precede its reception by in any accepted global trace. However, this emission is not observed in the multitrace in Fig. 1d. Therefore, dealing with partial observability does not boil down to a simple adaptation of the algorithm in [18]. In this paper, the aforementioned two types of partial observation (unobserved subsystems and early interruption of observation) will be approached in the same manner, noting in particular that an empty local trace can be seen both as missing and incomplete. The key mathematical operator used for that purpose consists in the removal of a lifeline from both interactions and multitraces. This operator allows us to define an algorithm for recognizing multi-prefixes of accepted multitraces while avoiding the complex search for a matching global execution, taking into account potential missing actions.

3 Multitraces, Interactions, and Removal Operations

3.1 Multitraces

As outlined in Sect. 2, a DS is a collection of communicating subsystems, each having a lifeline as local interface. A DS is characterized by a finite set of lifelines $L \subseteq \mathcal {L}$, called a signature. For $L \subseteq \mathcal {L}$, $\mathbb {A}(L)$ denotes the set $\cup _{l \in L} \mathbb {A}_l$. Executions of a DS are associated to multitraces, i.e. collections of traces, one per lifeline:

Definition 1

Given $L \subseteq \mathcal{L}$, the set $\mathbb {M}(L)$ of multitraces over L is^{Footnote 1} $\prod _{l \in L} \mathbb {T}_l$.

For $\mu = (t_l)_{l \in L}$ in $\mathbb {M}(L)$, we denote by $\mu _{|l}$ the trace component $t_l \in \mathbb {T}_l$ and by $\overline{\mu } = \{ \mu ' ~|~\mu ' \in \mathbb {M}(L), \forall l \in L, \mu '_{|l} \in \overline{\mu _{|l}}\}$ the set of its multi-prefixes.

Multi-prefixes are extended to sets: $\overline{M}$ is the set of all multi-prefixes of all multitraces in $M \subseteq \mathbb {M}(L)$. We denote by $\varepsilon _L$ the empty multitrace in $\mathbb {M}(L)$ defined by $\forall l \in L, {\varepsilon _L}_{|l} = \varepsilon $. Additionally, for any $\mu \in \mathbb {M}(L)$, we use the notations $\mu [t]_l$ to designate the multitrace $\mu $ in which the component on l has been replaced by $t \in \mathbb {T}_l$ and $|\mu |$ to designate the cumulative length $|\mu | = \sum _{l \in L} |\mu _{|l}|$ of $\mu $.

As discussed in Sect. 2, two communication actions occurring on different traces of a multitrace cannot be temporally ordered. Likewise, when several subsystems are observed concurrently, there is no way to synchronize the endings of their observations. So, any multitrace $\mu ' \in \overline{\mu }$ can be understood as a partial observation of the execution characterized by $\mu $. An edge case of this partial observation occurs when some of the subsystems are not observed at all, i.e. when some lifelines are missing. The $\textsf{rmv}_h$ function of Definition 2 simply removes the trace concerning the lifeline h from a multitrace.

Definition 2

For $L \subseteq \mathcal{L}$, the function $\textsf{rmv}_h: \mathbb {M}(L) \rightarrow \mathbb {M}(L\setminus \{h\})$ is s.t.: $ \forall \mu \in \mathbb {M}(L), \; \textsf{rmv}_h(\mu ) = (\mu _{|l})_{l \in L\setminus \{h\}} $

The function $\textsf{rmv}_h$ is canonically extended to sets. We introduce operations to add an action to the left (resp. right) of a multitrace. For the sake of simplicity, we use the same symbol for these left- and right-concatenation operations:

Note that for any $\mu $ and a, we have . We extend to sets of multitraces as follows: and .

For two multitraces $\mu _1$ and $\mu _2$ in $\mathbb {M}(L)$:

$\mu _1 \cup \mu _2$ denotes the alternative defined as follows: $\mu _1 \cup \mu _2 = \{ \mu _1,\mu _2 \}$;
$\mu _1 ;\mu _2$ denotes their sequencing defined as follows: if $\mu _2 = \varepsilon _L$ then $\mu _1 ;\mu _2 = \mu _1$ else, $\mu _2$ can be written as and ;
denotes their interleaving and is defined as the set of multitraces describing parallel compositions of $\mu _1$ and $\mu _2$:

Let us remark that $\mu '$ is a prefix of a multitrace $\mu $ (i.e. $\mu ' \in \overline{\mu }$) iff there exists $\mu ''$ verifying $\mu ' ; \mu '' = \mu $. Operations $\cup $, $;$ and $||$ are extended to sets of multitraces as $\diamond : \mathcal {P}(\mathbb {M}(L))^2 \rightarrow \mathcal {P}(\mathbb {M}(L))$ for $\diamond \in \{ \cup ,~;,~||\}$. Operators $;$ and $||$ being associative, this allows for the definition of repetition operators in the same manner as the Kleene star is defined over the classical concatenation. Given $\diamond \in \{ ;,~||\}$, the Kleene closure $^{\diamond *}$ is s.t. for any set of multitraces $T \subseteq \mathbb {M}(L)$ we have:

$$ T^{\diamond *} = \bigcup _{\begin{array}{c} j \in \mathbb {N} \end{array}} T^{\diamond j} \text{ with } T^{\diamond 0} = \{ \varepsilon _L \} \text{ and } T^{\diamond j} = T \diamond T^{\diamond (j-1)} \text{ for } j > 0 $$

$\mathbb {M}(L)$ fitted with the set of algebraic operators $\mathcal {F} = \{\cup , ;, ||, ~^{;*}, ~^{||*} \}$ is an $\mathcal {F}$-algebra. The operation $\textsf{rmv}_h$ preserves the algebraic structures between the $\mathcal {F}$-algebras of signatures L and $L \setminus \{h\}$.

Property 1 (Elimination preserves operators)

For any $\mu _1$ and $\mu _2$ in $\mathbb {M}(L)$, for any $\diamond \in \{\cup ,;,||\}$, $\textsf{rmv}_h(\mu _1 \diamond \mu _2) = \textsf{rmv}_h(\mu _1) \diamond \textsf{rmv}_h(\mu _2)$.

Property 1 is obtained directly for the union and by induction for the other cases. Those results can be extended to sets of multitraces and imply that repetitions of those scheduling algebraic operators (with their Kleene closures) are also preserved by the elimination operator $\textsf{rmv}_h$.

3.2 Interactions

Interaction models, such as the one in Fig. 1c, can be formalized as terms of an inductive language. [18, 20] consider an expressive language with two sequencing operators, weak and strict, for ordering actions globally. Here, as only collections of remote local traces are considered, weak and strict sequencing can no longer be distinguished. This explains why we only consider a unique sequencing operator seq. Following the syntax from Definition 3, the interaction term of Fig. 1c is:

Definition 3

Given signature L, the set $\mathbb {I}(L)$ of interactions over L is the set of ground terms built over the following symbols provided with arities in $\mathbb {N}$:

− the empty interaction $\varnothing $ and any action a in $\mathbb {A}(L)$ of arity 0;

− the two loop operators $loop_S$ and $loop_P$ of arity 1;

− and the three operators seq, par and alt of arity 2.

The semantics of interactions can be defined as a set of multitraces in a denotational style by associating each syntactic operator with an algebraic counterpart. This is sketched out in Fig. 2 in which the semantics of the interaction given in Fig. 1c is given. The denotational formulation, which is compositional, is defined in Definition 4 and illustrated in Fig. 1c.

Definition 4

($\mathbb {M}$-semantics). Given $L \subseteq \mathcal{L}$, the multitrace semantics $\sigma _{|L} : \mathbb {I}(L) \rightarrow \mathcal {P}(\mathbb {M}(L))$ is defined inductively using the following interpretations:

− $\{ \varepsilon _L \}$ for $\varnothing $ and for a in $\mathbb {A}_L$;

− $;^*$ (resp. $||^*$) for loop operator $loop_S$ (resp. $loop_P$);

− $;$ (resp. $||$ and $\cup $) for binary operator seq (resp. par and alt).

Interactions can also be associated with operational semantics in the style of Plotkin [21]. Its definition relies on two predicates denoted by $\downarrow $ and $\rightarrow $: for an interaction i, $i \downarrow $ states that $\varepsilon _L \in \sigma _{|L}(i)$ and $i \xrightarrow {a} i'$ states that all multitraces of the form with $\mu ' \in \sigma _{|L}(i')$ are multitraces of $\sigma _{|L}(i)$. This operational semantics is equivalent to the denotational formulation.

Property 2 (Operational semantics)

There exist a predicate $\downarrow \subseteq \mathbb {I}(L)$ and a relation $\rightarrow \subseteq \mathbb {I}(L) \times \mathbb {A}(L) \times \mathbb {I}(L)$ such that, for any $i \in \mathbb {I}(L)$ and $\mu \in \mathbb {M}(L)$, the statement $\mu \in \sigma _{|L}(i)$ holds iff it can be proven using the following two rules:

The proof is a transposition for multitrace semantics of the proof in [21] given for global traces. The algebraic characterisation of Definition 4 underpins results involving the use of the $\textsf{rmv}_h$ function while the operational characterisation of Property 2 is required in the definition and proof of the RV algorithm. In this paper, we do not need the inductive definitions of $\downarrow $ and $\rightarrow $. It suffices to consider their existence (Property 2). In addition, we will use the notation $i\xrightarrow {a}$ (resp. ) when there exists (resp. does not exist) an interaction $i'$ s.t. $i \xrightarrow {a} i'$.

The removal of lifelines for multitraces (cf. Definition 2) has a counterpart for interactions. On the left of Fig. 3 we draw our previous example while highlighting lifeline which we remove to obtain the interaction on the right. Whenever we remove a lifeline l, the resulting interaction does not contain any action occurring on l. Removal, as defined in^{Footnote 2} Definition 5 in a functional style, preserves the term structure, replacing actions on the removed lifeline with the empty interaction.

Definition 5

For a signature $L \subseteq \mathcal{L}$ and a lifeline $h \in L$ we define $\textsf{rmv}_h: \mathbb {I}(L) \rightarrow \mathbb {I}(L \setminus \{h\})$ s.t. for any interaction $i \in \mathbb {I}(L)$:

$\textsf{rmv}_h(i) = {\textbf { match }} i {\textbf { with}}$

$\begin{array}{lll} |~\varnothing &{} \rightarrow &{} \varnothing \\ |~a \in \mathbb {A}(L) &{} \rightarrow &{} {\textbf {if }} \theta (a) = h {\textbf { then }} \varnothing {\textbf { else }} a \\ |~f(i_1,i_2) &{} \rightarrow &{} f(\textsf{rmv}_h(i_1),\textsf{rmv}_h(i_2)) {\textbf { for }} f \in \{ seq,alt,par \} \\ |~loop_k(i_1) &{} \rightarrow &{} loop_k(\textsf{rmv}_h(i_1)) {\textbf { for }} k \in \{S,P\} \end{array}$

Theorem 1 relates the removal operations on multitraces and interactions with one another. The semantics of an interaction i in which we remove lifeline h can be obtained by removing lifeline h from all the multitraces of the semantics of i. This result is obtained reasoning by induction on interaction terms.

Theorem 1

For any signature L, any $i \in \mathbb {I}(L)$ and any $h \in L$:

$$ \sigma _{|L \backslash \{ h \} }(\textsf{rmv}_h(i)) = \textsf{rmv}_h(\sigma _{|L}(i)) $$

4 Offline RV for Multitraces

We aim to define a process to analyze a multitrace $\mu $ against a reference interaction i, both defined on a common signature L. To check whether or not a multitrace $\mu $ is accepted by i, i.e. $\mu \in \sigma _{|L}(i)$, the key principle given in [18] was to find a globally ordered behavior specified by i (via the $\rightarrow $ execution relation) that matches $\mu $, i.e. an accepted global trace that can be projected into $\mu $. To do so, it relies on a general rule s.t. $i \xrightarrow {a} i'$, i.e. it explores all the actions a directly executable from i and that match the head of a local trace. The analysis is then pursued recursively from $(i',\mu ')$, i.e. the multitrace where a has been removed and the follow-up interaction $i'$, until the multitrace is emptied of actions.

For illustrative purposes, let us consider Fig. 4 where each square annotated with a circled number corresponds to such a tuple $(i,\mu )$, with interaction i drawn on the left and multitrace $\mu $ represented on the right. Starting from the tuple indexed by , with interaction $i_3$ and multitrace , one can see that we can reach by both consuming from $\mu _3$ and executing it in $i_3$, leading to the tuple $(i_4,\mu _4)$ in . This transition is based on having . Thus, Fig. 4 sketches the construction of a graph whose nodes are pairs of interactions and multitraces and whose arcs are built using the $\leadsto $ relation.

While in [18], we were interested in solving the membership problem "$\mu \in \sigma _{|L}(i)$", we are now interested in defining an offline RV algorithm. In line with the discussion of Sect. 2 about partial observability, $\mu $ reveals an error if $\mu $ is neither in $\sigma _{|L}(i)$ nor can be extended into an element of $\sigma _{|L}(i)$ i.e. $\mu $ diverges from i iff $\mu \not \in \overline{\sigma _{|L}(i)}$. We introduce a rule involving the removal operation to accommodate the need to identify multi-prefixes of multitraces. Indeed, as the execution relation $\rightarrow $ only allows executing actions in the global order in which they are intended to occur, we may reach cases in which the next action which may be consumed in the multitrace cannot be executed due to having a preceding action missing in the multitrace.

Let us illustrate this with node of Fig. 4. is the first action that occurs on lifeline in the multitrace. However, it cannot be executed because it must be preceded by . Yet, either because the behavior on lifeline is not observed or because the logging process ceased too early on , it might well be that occurred in the actual execution although it was not logged. With our new algorithm, because the condition that is satisfied, from node , we apply a rule yielding the transformation , removing lifeline , which allows us to pursue the analysis from node . To summarize, Fig. 4 illustrates (part of) the graph that can be constructed from a pair $(i_0,\mu _0)$ using the relation $\leadsto $. We have 5 nodes numbered from 0 (the initial node of the analysis) to 4. Arcs correspond to the consumption of an action, the application of the $\textsf{rmv}$ operator, or the emission of a verdict. The empty multitrace in node allows us to conclude $\mu _0 \in \overline{\sigma _{|L}(i_0)}$.

4.1 The Algorithm

As the $\textsf{rmv}$ operator has the effect of changing the signature, we introduce the set $\mathbb {I}_{\mathcal {L}}$ (resp. $\mathbb {M}_{\mathcal {L}}$) to denote the set of all interactions (resp. multitraces) defined on a signature of ${\mathcal {L}}$. Let us define a directed search graph with vertices either of the form $(i,\mu ) \in \mathbb {I}_{\mathcal {L}} \times \mathbb {M}_{\mathcal {L}}$ or one of two specific verdicts $Ok$ and $Nok$.

We denote by $\mathbb {V}$ the set of all vertices:

$$\begin{aligned} \mathbb {V} = \{Ok,Nok\} \cup ( \; \bigcup _{ L \subseteq {\mathcal {L}}} \mathbb {I}(L) \times \mathbb {M}(L) \; ) \end{aligned}$$

The arcs of $\mathbb {G}$ are defined by 4 rules: , leading to respectively the sink vertices and $Nok$, (for “execute") for consuming actions of the multitrace according to the $\rightarrow $ predicate of the operational formulation (cf. Property 2), and (for “removal"), for removing a lifeline from the interaction and multitrace.

Definition 6 (Search graph)

$\mathbb {G} = (\mathbb {V},\leadsto )$ is the graph s.t. for all $v,v'$ in $\mathbb {V}$, $v \leadsto v'$ iff there exists a rule $R_x$ with $x \in \{o,n,e,r\}$ s.t. $(R_x) \frac{v}{v'}$ where rules $R_x$ are defined as follows, with $L\subseteq \mathcal {L}$, $h\in L$, $i,i'\in \mathbb {I}(L)$, and $\mu ,\mu '\in \mathbb {M}(L)$:

Rules and specify edges of the form $(i,\mu ) \leadsto (i',\mu ')$ with $i'$ and $\mu '$ defined on the same signature: the application of corresponds to the simultaneous consumption of an action at the head of a component of $\mu $ and the execution of a matching action in i while the application of corresponds to the removal of a lifeline h s.t. $\mu _{|h} = \varepsilon $. Moreover, vertices of the form $(i,\mu )$ are not sinks of $\mathbb {G}$. Indeed, if $\mu = \varepsilon _{L}$ then can apply, otherwise $\mu \ne \varepsilon _{L}$ and: (1) if at least a component $\mu _{|h}$ of $\mu $ is empty, then rule can apply. (2) if there is a match between an action that can be executed from i and the head of a component of the multitrace then rule can apply. (3) if both conditions 1 and 2 do not hold then rule applies.

Proving $\mu \in \overline{\sigma _{|L}(i)}$, amounts to exhibiting a path in $\mathbb {G}$ starting from $(i,\mu )$ and leading to the verdict $Ok$. Figure 4 depicts such a path for the multitrace w.r.t. the interaction $i_0$ of node . A first step (application of ) removes lifeline leading to node . This is possible because . From there, by applying rule , the execution of allows to reach either node or node depending on the loop used. From node , the previous removal of lifeline has unlocked the execution of (application of ). What remains is $\varepsilon _L$, and hence we can apply rule . From the existence of this path leading to $Ok$ we conclude that $\mu _0$ is a multi-prefix of a multitrace of the interaction depicted in Fig. 1c.

Property 3 (Finite search space)

Let $L \subseteq \mathcal {L}$, $\mu \in \mathbb {M}(L)$ and $i \in \mathbb {I}(L)$. The sub-graph of $\mathbb {G}$ of all vertices reachable from $(i,\mu )$ is finite.

We establish this property by using a measure |v| defined on the vertices v in $\mathbb {V}$ by $|v|= 0$ if $v \in \{Ok,Nok\}$ and $|v| = |\mu |+|L| + 1$ if $s = (i,\mu ) \in \mathbb {I}(L) \times \mathbb {M}({L})$.

Definition 7 (Multitrace analysis)

For any $L \subset \mathcal {L}$, we define $\omega _L : \mathbb {I}(L) \times \mathbb {M}({L}) \rightarrow \{Pass,Fail\}$ s.t. for any $i \in \mathbb {I}(L)$ and $\mu \in \mathbb {M}(L)$:

− $\omega _L(i,\mu ) = Pass$ iff there exists a path in $\mathbb {G}$ from $(i,\mu )$ to $Ok$

− $\omega _L(i,\mu ) = Fail$ otherwise

Given Property 3, Definition 7 is well founded insofar as the sub-graph of $\mathbb {G}$ issued from any pair $(i,\mu )$ of $\mathbb {V}$ is finite and all paths from $(i,\mu )$ can be extended until reaching a verdict ($Ok$ or $Nok$). Then, we need to prove that the existence of a path from $(i,\mu )$ to $Ok$ guarantees that $\mu $ is a prefix of a multitrace of i, and that the non-existence of such a path guarantees that $\mu $ is not such a prefix. By reasoning by induction on the measure of the vertices of $\mathbb {G}$, we can establish:

Theorem 2

For any $i \in \mathbb {I}(L)$ and any $\mu \in \mathbb {M}(L)$:

$$ \left( \begin{array}{c} \mu \in \overline{\sigma _{|L}(i)} \end{array} \right) \Leftrightarrow \left( \begin{array}{c} \omega _L(i,\mu ) = Pass \end{array} \right) $$

4.2 Considerations on Implementation

Using a reduction of the 3 SAT problem inspired by [3, 18], we can state that the problem of recognizing correct multi-prefixes w.r.t. interactions is NP-hard:

Property 4

The problem of determining whether or not $\mu \in \overline{\sigma _{|L}(i)}$ is NP-hard.

Given the NP-hardness of the underlying problem, the implementation of our algorithm, which relies on the exploration of a graph $\mathbb {G}$, uses additional techniques to reduce the average complexity. Such techniques may include means to cut parts of the graph, the use of pertinent search strategies and priorities for the application of the rules. For instance, if is applicable from a node $(i,\mu )$, we can apply $\textsf{rmv}$ on all lifelines which can be removed simultaneously. Also, if both and are applicable from that same node, we can choose not to apply . Those two points are respectively justified by a property of commutativity for $\textsf{rmv}$ (i.e. $\textsf{rmv}_h \circ \textsf{rmv}_{h'} = \textsf{rmv}_{h'} \circ \textsf{rmv}_h$) and a confluence property for $\leadsto $ (i.e. if $(i, \mu ) \overset{*}{\leadsto }\ Ok$ and $(i, \mu ) \leadsto (\textsf{rmv}_{h}(i),\textsf{rmv}_{h}(\mu ))$ then $(\textsf{rmv}_{h}(i),\textsf{rmv}_{h}(\mu )) \overset{*}{\leadsto }\ Ok$).

5 Experimental Assessment

5.1 3 SAT Benchmarks

We have implemented our approach as an extension of the tool HIBOU [17]. In light of Property 4, we have compared the results HIBOU obtained on translated three SAT problems against those of an SAT solver (Varisat [2]). We have used three sets of problems: two custom benchmarks with randomly generated problems and the UF20 benchmark [1]. Figure 5 provides details on 2 benchmarks with, on the top left, information about the input problems (numbers of variables, clauses, instances), on the bottom left statistical information about the time required for the analysis using each tool, and, on the right a corresponding scatter plot. In the plot, each point corresponds to a given 3-SAT problem, with its position corresponding to the time required to solve it. Points in red are unsatisfiable problems while those in blue are satisfiable.

5.2 Use Cases Experiments

To consider concrete and varied interactions, we experiment with four examples: a protocol for purchasing books [4], a system for querying complex sensor data [5], the Alternating Bit Protocol [22] and a network for uploading data to a server [6]. Figure 6 partially reports on those experiments. For each example, we generated randomly accepted multitraces (ACPT) up to some depth, for which we then randomly selected prefixes (PREF). For each such prefix, we then performed mutations of three kinds: swapping actions (SACT), swapping trace components (SCMP) and inserting noise (NOIS). We report for each category of multitraces times to compute verdicts in Fig. 6. As expected, running the algorithm on those multitraces recognizes prefixes and mutants which go out of specification.

6 Conclusion

We have proposed an algorithm for offline RV from multitraces (sets of local execution logs collected on the DS) against interaction models (formal specifications akin to UML-SD/MSC). These multitraces can be partial views of DS executions because some components may either not be observed at all or their observation may have ceased too early. We have proved the correctness of our algorithm which boils down to a graph search. This search is based on two principles, either we match actions of the interaction against those of the input multitrace, or we apply a removal operation on multitraces and interactions. Removal steps allow dealing with observability via disregarding components which are no longer observed parts of the interaction. Future works include other uses of the removal operator (e.g. for performance improvements on RV).

Notes

1.
Given a family $(A_i)_{i \in I}$ of sets indexed by a finite set I, $\prod _{i \in I} A_i$ is the set of tuples $(a_1, \ldots , a_i,\ldots )$ with $\forall i \in I, a_i \in A_i$.
2.
We overload the notation $\textsf{rmv}_h$ which applies to both multitraces and interactions.

References

SATLIB - Benchmark. https://www.cs.ubc.ca/~hoos/SATLIB/benchm.html
Varisat CDCL solver. https://docs.rs/varisat/latest/varisat/
Alur, R., Etessami, K., Yannakakis, M.: Realizability and verification of MSC graphs. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 797–808. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48224-5_65
Chapter MATH Google Scholar
Ancona, D., Ferrando, A., Franceschini, L., Mascardi, V.: Coping with bad agent interaction protocols when monitoring partially observable multiagent systems. In: Demazeau, Y., An, B., Bajo, J., Fernández-Caballero, A. (eds.) PAAMS 2018. LNCS (LNAI), vol. 10978, pp. 59–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94580-4_5
Chapter Google Scholar
Bakillah, M., Liang, S., Zipf, A., Mostafavi, M.A.: A dynamic and context-aware semantic mediation service for discovering and fusion of heterogeneous sensor data. J. Spatial Inform. Sci. 6, 155–185 (06 2013)
Google Scholar
Bejleri, A., Domnori, E., Viering, M., Eugster, P., Mezini, M.: Comprehensive multiparty session types. The Art, Science, and Engineering of Programming 3 (02 2019)
Google Scholar
Benharrat, N., Gaston, C., Hierons, R.M., Lapitre, A., Le Gall, P.: Constraint-based oracles for timed distributed systems. In: Yevtushenko, N., Cavalli, A.R., Yenigün, H. (eds.) ICTSS 2017. LNCS, vol. 10533, pp. 276–292. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67549-7_17
Chapter Google Scholar
Bocchi, L., Chen, T., Demangeon, R., Honda, K., Yoshida, N.: Monitoring networks through multiparty session types. Theor. Comput. Sci. 669, 33–58 (2017)
Article MathSciNet MATH Google Scholar
Cavalcanti, A., Gaudel, M.-C., Hierons, R.M.: Conformance relations for distributed testing based on CSP. In: Wolff, B., Zaïdi, F. (eds.) ICTSS 2011. LNCS, vol. 7019, pp. 48–63. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24580-0_5
Chapter Google Scholar
Dan, H., Hierons, R.M.: The oracle problem when testing from MSCs. Comput. J. 57(7), 987–1001 (2014)
Article Google Scholar
Hierons, R.M., Merayo, M.G., Núñez, M.: Controllable test cases for the distributed test architecture. In: Cha, S.S., Choi, J.-Y., Kim, M., Lee, I., Viswanathan, M. (eds.) ATVA 2008. LNCS, vol. 5311, pp. 201–215. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88387-6_16
Chapter MATH Google Scholar
Hierons, R.M., Merayo, M.G., Núñez, M.: Scenarios-based testing of systems with distributed ports. Softw. Pract. Exp. 41(10), 999–1026 (2011)
Google Scholar
Inçki, K., Ari, I.: A novel runtime verification solution for IoT Systems. IEEE Access 6, 13501–13512 (2018)
Article Google Scholar
ITU: Message Sequence Chart (MSC). http://www.itu.int/rec/T-REC-Z.120
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. In: Concurrency: the Works of Leslie Lamport, pp. 179–196. ACM (2019)
Google Scholar
Mace, J., Roelke, R., Fonseca, R.: Pivot tracing: dynamic causal monitoring for distributed systems. In: SOSP, pp. 378–393. ACM (2015)
Google Scholar
Mahé, E.: Hibou tool. http://www.github.com/erwanM974/hibou_label (2022)
Mahé, E., Bannour, B., Gaston, C., Lapitre, A., Le Gall, P.: A small-step approach to multi-trace checking against interactions, pp. 1815–1822. SAC ’21, ACM (2021)
Google Scholar
Mahé, E., Bannour, B., Gaston, C., Lapitre, A., Le Gall, P.: Dealing with observability in interaction-based offline runtime verification of distributed systems. CoRR (2022). https://arxiv.org/abs/2212.09324
Mahé, E., Gaston, C., Le Gall, P.: Revisiting Semantics of Interactions for Trace Validity Analysis. In: FASE 2020. LNCS, vol. 5311, pp. 482–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45234-6_24
Mahé, E., Gaston, C., Le Gall, P.: Equivalence of denotational and operational semantics for interaction languages. In: TASE. pp. 113–130. Springer (2022). https://doi.org/10.1007/978-3-031-10363-6_8
Mauw, S., Reniers, M.A.: High-level message sequence charts. In: SDL Forum. pp. 291–306. Elsevier (1997)
Google Scholar
Neves, F., Machado, N., Pereira, J.: Falcon: A practical log-based analysis tool for distributed systems. In: DSN, pp. 534–541. IEEE Computer Society (2018)
Google Scholar
Nguyen, H.N., Poizat, P., Zaïdi, F.: Passive conformance testing of service choreographies. In: ACM SAC 2012, pp. 1528–1535 (2012)
Google Scholar
OMG: Unified Modeling Language, http://www.uml.org
Sánchez, C., et al.: A survey of challenges for runtime verification from advanced application domains (beyond software). Formal Methods in System Design , pp. 1–57 (2019). https://doi.org/10.1007/s10703-019-00337-w

Download references

Acknowledgements

The research leading to these results has received funding from the European Union’s Horizon Europe programme under grant agreement No 101069748 - SELFY project.

Author information

Authors and Affiliations

Université Paris -Saclay, CEA, List, 91120, Palaiseau, France
Erwan Mahe, Boutheina Bannour, Christophe Gaston & Arnault Lapitre
Université Paris-Saclay, CentraleSupélec, 91192, Gif-sur-Yvette, France
Pascale Le Gall

Authors

Erwan Mahe
View author publications
You can also search for this author in PubMed Google Scholar
Boutheina Bannour
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Gaston
View author publications
You can also search for this author in PubMed Google Scholar
Arnault Lapitre
View author publications
You can also search for this author in PubMed Google Scholar
Pascale Le Gall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erwan Mahe .

Editor information

Editors and Affiliations

Tehran Institute for Advanced Studies, Tehran, Iran
Hossein Hojjat
RWTH Aachen University, Aachen, Germany
Erika Ábrahám

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahe, E., Bannour, B., Gaston, C., Lapitre, A., Gall, P.L. (2023). Interaction-Based Offline Runtime Verification of Distributed Systems. In: Hojjat, H., Ábrahám, E. (eds) Fundamentals of Software Engineering. FSEN 2023. Lecture Notes in Computer Science, vol 14155 . Springer, Cham. https://doi.org/10.1007/978-3-031-42441-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-42441-0_7
Published: 30 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42440-3
Online ISBN: 978-3-031-42441-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Interaction-Based Offline Runtime Verification of Distributed Systems