Keywords

1 Introduction

Runtime Verification (RV) is an applied formal method for software reliability that analyzes the system by processing one trace at a time. In RV a specification is transformed automatically into a monitor, and algorithms are presented to evaluate monitors against traces of observations from the system. There are two kinds of monitoring algorithms in RV depending on when the trace is generated and processed. In online monitoring the monitor checks the trace while the system runs, while in offline monitoring a finite collection of previously generated traces are analyzed. Online monitoring is used to detect violations of the specification when the system is in operation, while offline monitoring is used in post-mortem analysis and for testing large systems before deployment.

Static verification techniques like model checking intend to show that every (infinite) run of a system satisfies a given specification, while runtime verification is concerned only with a single (finite) trace. Thus, RV sacrifices completeness to provide an applicable formal extension of testing. See [21, 24] for modern surveys on runtime verification and the recent book [4].

The first specification languages studied for runtime verification were based on temporal logics, typically LTL [6, 13, 22], regular expressions [28], timed regular expressions [1], rules [3], or rewriting [27]. In this paper we revisit the Stream Runtime Verification specification formalism, in particular the Lola specification language for synchronous systems [12]. The Lola language can express properties involving both the past and the future and their arbitrary combination. In SRV, specifications declare explicitly the dependencies between input streams of values—that represent the observations from the system—and output streams of values—that represent monitoring outputs, like error reports and diagnosis information. The fundamental idea of SRV is to cleanly separate the temporal reasoning from the individual operations to be performed at each step. The temporal aspects are handled in a small number of constructs to express the offsets between observations and their uses. For the data, SRV uses off-the-self domains with interpreted functions so function symbols can be used as constructors to create expressions, and their interpretation is used for evaluation during the monitoring process. The domains used for SRV are not restricted to Booleans and allow richer domains like Integers, Reals (for computing quantitative verdicts) and even queues, stacks, etc. These domains do not involve any reasoning about time. The resulting expressiveness of SRV surpasses that of temporal logics and many other existing formalisms including finite-state automata. The restriction of SRV to the domain of Booleans is studied in [10], including the expressivity, the comparison with logics and automata and the complexity of the decision problems.

The online monitoring problem of past specifications can be solved efficiently using constant space and linear time in the trace size. For future properties, on the other hand, the space requirement depends on the length of the trace for rich types (even though for LTL, that is for the verdict domain of the Booleans, one can use automata techniques to reduce the necessary space to exponential in the size of the specification). Consequently, online monitoring of future temporal formulas quickly becomes intractable in practical applications with long traces. On the other hand, the offline monitoring problem for LTL-like logics is known to be easy for purely past or purely future properties. We detail in the paper a syntactic characterization of efficiently monitorable specifications (introduced in [12]), for which the space requirement of the online monitoring algorithm is independent of the size of the trace, and linear in the specification size. This property was later popularized as trace length independence [5] and is a very desirable property as it allows online monitors to scale to arbitrarily large traces. In practice, most properties of interest in online monitoring can be expressed as efficiently monitorable properties. For the offline monitoring problem, we show an efficient monitoring strategy in the presence of arbitrary past and future combinations by scheduling trace length independent passes. We describe here the algorithm and results using the Lola specification language. An execution of the monitor extracted from a Lola specification computes data values at each position by evaluating the expressions over streams of input, incrementally computing the output streams.

Two typical specifications are properties that specify correct behavior, and statistical measures that allow profiling the system that produces the input streams. One important limitation of runtime verification is that liveness properties can never be violated on a finite trace. Hence, most of these properties have been typically considered as non-monitorable (for violation) as every finite prefix can be extended to a satisfying trace, at least if the system is considered as a black box and can potentially generate any suffix. An appealing solution that SRV supports is to compute quantitative measures from the observed trace. For example, apart from “there are only finitely many retransmissions of each package,” which is vacuously true over finite traces, SRV allows to specify “what is the average number of retransmissions.” Following this trend, runtime monitors can be used not only for bug-finding, but also for profiling, coverage, vacuity and numerous other analyses. An early approach for combining proving properties with data collection, which inspired SRV, appeared in [16].

In the present paper we present a simplified semantics of Lola [12] together with a detailed presentation of the monitoring algorithms as well as the necessary definitions and proofs. In the rest of the paper we use SRV and Lola interchangeably.

Related Work. The expressions that declare the dependencies between input streams and output streams in SRV are functional, which resemble synchronous languages—which are also functional reactive stream computation languages—like Lustre [20], Esterel [9] and Signal [17], with additional features that are relevant to monitoring. The main difference is that synchronous languages are designed to express behaviors and therefore assume the causality assumption and forbid future references, while in SRV future references are allowed to describe dependencies on future observations. This requires additional expressiveness in the language and the evaluation strategies to represent that the monitor cannot decide a verdict without observing future values. These additional verdicts were also introduced for this purpose in LTL-based logics, like \(\text {LTL}_3\) and \(\text {LTL}_4\) [6,7,8], to encode that the monitor is indecisive.

An efficient method for the online evaluation of past LTL properties is presented in [22], which exploits that past LTL can be recursively defined using only values in the previous state of the computation. The efficiently monitorable fragment of SRV specifications generalize this idea, and apply it uniformly to both verification and data collection. One of the early systems that most closely resembles Lola is Eagle [3], which allows the description of monitors using greatest and least fixed points of recursive definitions. Lola differs from Eagle in the descriptive nature of the language, and in that Lola is not restricted to checking logical formulas, but can also express numerical queries.

The initial application domain of Lola was the testing of synchronous hardware by generating traces of circuits and evaluating monitors against these traces. Temporal testers [26] were later proposed as a monitoring technique for LTL based on Boolean streams. Copilot [25] is a domain-specific language that, similar to Lola, declares dependencies between streams in a Haskell-based style, to generate C monitors that operate in constant time and space (the fragment of specifications that Copilot can describe is efficiently monitorable). See also [18].

The simple version of Lola presented here does not allow to quantify over objects and instantiate monitors to follow the actual objects observed, like in Quantified Event Automata [2]. Lola2.0 [14] is an extension of Lola that allows to express parametrized streams and dynamically generates monitors that instantiate these streams for the observed data items. The intended application of Lola2.0 is network monitoring.

Stream runtime verification has a also been extended recently to asynchronous and real-time systems. RTLola [15] extends SRV from the synchronous domain to timed streams. In RTLola streams are computed at predefined periodic instants of time, collecting aggregations between these predefined instants using a library of building blocks. TeSSLa [11] also offers a small collection of primitives for expressing stream dependencies (see also [23]) but allows to compute timed-streams at arbitrary real-time instants. The intended application of TeSSLa is hardware based monitoring. Striver [19] offers a Lola-like language with time offsets, that allows to express explicit instants of time in the expressions between streams. Striver is aimed at testing and monitoring of cloud based systems.

The rest of the paper is structured as follows. Section 2 revisits the syntax and semantics of SRV. Section 3 presents the online monitoring of SRV specifications, including the notion of efficient monitorability. Section 4 presents the algorithm for offline monitoring, and finally Sect. 5 concludes.

2 Overview of Stream Runtime Verification

In this section we describe SRV using the Lola specification language. The monitoring algorithms will be presented in Sects. 3 and 4.

2.1 Specification Language: Syntax

We use many-sorted first order interpreted theories to describe data domains. A theory is given by a finite collection T of types and a finite collection F of function symbols. Since our theories are interpreted every type T is associated with a domain D of values and every symbol f is associated with a computable function that, given elements of the domains of the arguments compute a value of the domain of the resulting type. We use sort and type interchangeably in this paper.

For example, the theory Boolean uses the type Bool associated with the Boolean domain with two values \(\{\top ,\bot \}\), and has constant symbols \( true \) and \( false \), and binary function symbols , and \(\mathrel {\vee }\), unary function symbol \(\mathclose {\lnot }\), etc. all with their usual interpretations. A more sophisticated theory is Naturals, the theory of the Natural numbers, that uses two types Nat and Bool. The type Nat is associated with the domain \(\{0,1,\ldots \}\) of the Natural numbers, and has constant symbols 0, 1, 2, ... and binary symbols \(+\), \(*\), etc. of type \(\textit{Nat}\times \textit{Nat}\mathrel {\rightarrow }\textit{Nat}\). Other function symbols in this theory are predicates <, \(\le \), ... of type \(\textit{Nat}\times \textit{Nat}\mathrel {\rightarrow }\textit{Bool}\). All our theories include equality and also, for every type T, a ternary predicate \(\texttt {if} \;\cdot \;\texttt {then} \;\cdot \;\texttt {else} \;\cdot \) of type \(\textit{Bool}\times T\mathrel {\rightarrow }T\). For simplicity we restrict the rest of the paper to types Nat and Bool.

Definition 1

(Stream Expression). Given a finite set Z of stream variable (each with a given type) the set of stream expressions is defined as follows:

  • Variable: If s is a stream variable of type T, then s is a stream expression of type T;

  • Function Application: Let \(f: T_1\ \times T_2\times \cdots \times T_k \mapsto \ T\) be a k-ary function symbol. If for \(1 \le i \le k\), \(e_i\) is an expression of type \(T_i\), then \(f(e_1, \ldots , e_k)\) is a stream expression of type T.

  • Offset: If v is a stream variable of type T, c is a constant of type T, and k is an integer value, then v[kc] is a stream expression of type T.

We use \( Expr (Z)\) for the set of stream expressions using stream variables Z.

Constants c (that is, 0-ary function symbols) and stream variables v are called atomic stream expressions. Stream variables are used to represent streams. Informally, the offset term v[kc] refers to the value of v offset k positions from the current position, where a negative offset refers to a past position in the stream and a positive offset refers to a future position in the stream. The constant c is the default value of type T assigned to positions from which the offset is past the end or before the beginning of the stream. For example \(v[-1, true ]\) refers to the previous position of stream v, with the value \( true \) when v does not have a previous position (that is when \(v[-1, true ]\) is evaluated at the beginning of the trace).

A Lola specification describes a relation between input streams and output streams. A stream \(\sigma \) of type T and length N is a finite sequence of values from the domain of T; \(\sigma (i)\), \( i \ge 0\) denotes the value of the stream at time step i.

Definition 2

(Lola specification). A Lola specification \(\varphi : \left\langle I,O,E \right\rangle \) consists of:

  • a finite set I of typed independent stream variables;

  • a collection O of typed dependent stream variables; and

  • a collections E of defining expressions, with one expression \(E_y\in Expr (I\cup {}O)\) for each output variable \(y\in O\), where y and \(E_y\) must have the same type.

We write \(y:=E_y\) to denote that the stream y is defined by its defining expression \(E_y\), which can use every stream variable in \(I\cup O\) (including y itself) as atomic terms. Sometimes, Lola specifications include a collection of triggers defined by expressions of type Bool over the stream variables, with the intended meaning of informing the user when the corresponding expressions become true, but we do not use triggers in the presentation in this paper.

Independent variables refer to input streams and dependent variables refer to output streams. It is often convenient to partition the dependent variables into output variables and intermediate variables to distinguish streams that are of interest to the user from those that are used only to facilitate the description of other streams. However, for the semantics and the algorithm this distinction is not important, and hence we will ignore this classification in the rest of the paper.

Example 1

Let \(x_1\) and \(x_2\) be stream variables of type Boolean and \(x_3\) be a stream variable of type integer. The following is an example of a Lola specification with \(I=\{x_1, x_2, x_3\}\) as independent variables, \(O=\{y_1,\ldots ,y_{10}\}\) as dependent variables and the following defining equations:

$$ \begin{array}{lcl@{}lcl} y_1 &{} := &{} \mathbf {true} &{} y_6 &{} := &{} \texttt {if} \;\;x_1\;\;\texttt {then} \;\;x_3\le y_4\;\;\texttt {else} \;\;\lnot y_3 \\ y_2 &{} := &{} x_3 &{} y_7 &{} := &{} x_1[+1, false ] \\ y_3 &{} := &{} x_1 \vee (x_3 \le 1) &{} y_8 &{} := &{} x_1[-1, true ] \\ y_4 &{} := &{} ((x_3)^2 + 7)\ mod \ 15 &{} y_9 &{} := &{} y_{9}[-1, 0] + (x_3\ mod \ 2) \\ y_5 &{} := &{} \texttt {if} \;\;y_3\;\;\texttt {then} \;\;y_4\;\;\texttt {else} \;\;y_4+1 &{} y_{10} &{} := &{} x_2\ \vee \ (x_1\ \wedge \ y_{10}[1, true ]) \end{array} $$

Stream variable \(y_1\) denotes a stream whose value is \( true \) at all positions, while \(y_2\) denotes a stream whose values are the same at all positions as those in \(x_3\). The values of the streams corresponding to \(y_3,\ldots ,y_6\) are obtained by evaluating their defining expressions place-wise at each position. The stream corresponding to \(y_7\) is obtained by taking at each position i the value of the stream corresponding to \(x_1\) at position \(i+1\), except at the last position, which assumes the default value \( false \). Similarly for the stream for \(y_8\), whose values are equal to the values of the stream for \(x_1\) shifted by one position, except that the value at the first position is the default value \( true \). The stream specified by \(y_9\) counts the number of odd entries in the stream assigned to \(x_3\) by accumulating \((x_3\ mod \ 2)\). Finally, \(y_{10}\) denotes the stream that gives at each position the value of the temporal formula \(x_1\mathbin {\mathcal {U}}x_2\) with the stipulation that unresolved eventualities be regarded as satisfied at the end of the trace.    \(\square \)

To present formal results, it is sometimes convenient to work with a simpler class of specifications.

Definition 3

(Flat). A specification is flat if each defining expression \(E_y\) is one of the following

  • A constant c

  • A stream variable v

  • A constructor over stream variables \(f(v_1,\ldots ,v_n)\)

  • An offset expression v[kc].

Definition 4

(Normalized). A specification is normalized if it only contains offsets 1 or \(-1\).

Any Lola specification can be converted into a flat specification by introducing extra stream variables as place-holders for complex sub-expressions. Similarly, any Lola specification can be converted into a normalized specification by introducing additional stream variables defined to carry value \(n-1\) for offsets of \(n>1\) (and \(n+1\) for offsets of \(n<-1\)). This transformation also preserves flatness so every Lola specification can be converted into a normalized flat specification.

Example 2

Consider the Lola specification with \(I=\{x_1,\ldots ,x_5\}\), \(O=\{y\}\) and

$$ y := x_1[1,0] + \texttt {if} \;x_2[-1, true ]\;\texttt {then} \;x_3\;\texttt {else} \;x_4 + x_5. $$

The normalized specification uses \(O=\{y,y_1,\ldots ,y_4\}\) with equations:

$$ \begin{array}{lll} y \; := \; y_1 + y_2 \;\;\;\; &{} \;\;\;\; \begin{array}{lcl} y_1 &{} := &{} x_1[1,0] \\ y_2 &{} := &{} \texttt {if} \;\;y_3\;\;\texttt {then} \;\;x_3\;\;\texttt {else} \;\;y_4 \end{array} \;\;\;\; &{} \;\;\;\; \begin{array}{lcl} y_3 &{} := &{} x_2[-1, true ] \\ y_4 &{} := &{} x_4 + x_5 \end{array} \end{array} $$

   \(\square \)

2.2 Specification Language: Semantics

In order to define the semantics of SRV specifications we first define how to evaluate expressions. Consider a map \(\sigma _I\) that assigns one stream \(\sigma _x\) of type T and length N for each input stream variable x of type T, and a map \(\sigma _O:\{\ldots ,\sigma _y,\ldots \}\) that contains one stream \(\sigma _y\) of length N for each defined stream variable y (again of the same type as y). We call \((\sigma _I,\sigma _O)\) an interpretation of \(\varphi \), and use \(\sigma \) as the map that assigns the corresponding stream as \(\sigma _I\) or \(\sigma _O\) (depending on whether the stream variable is an input variable or an output variable).

Definition 5

(Valuation). Given an interpretation \((\sigma _I,\sigma _O)\) a valuation is a map \(\llbracket \cdot \rrbracket \) that assigns to each expression a stream of length N of the type of the expression as follows:

$$ \begin{array}{lcl} \llbracket c\rrbracket (j) &{} = &{} c \\ \llbracket v\rrbracket (j) &{}\; = \; &{} \sigma _v(j) \\ \llbracket f(e_1,\ldots ,e_k)\rrbracket (j) &{} = &{} f(\llbracket e_1\rrbracket (j),\ldots ,\llbracket e_k\rrbracket (j)) \\ \llbracket \texttt {if} \;e_1\;\texttt {then} \;e_2\;\texttt {else} \;e_3\rrbracket (j) &{} = &{} \texttt {if} \;\llbracket e_1\rrbracket (j)\;\texttt {then} \;\llbracket e_2\rrbracket (j)\;\texttt {else} \;\llbracket e_3\rrbracket (j) \\ \llbracket v[k,c]\rrbracket (j) &{} = &{} {\left\{ \begin{array}{ll} \llbracket v\rrbracket (j+k) &{} if \; 0 \le j+k < N \\ c &{} otherwise \end{array}\right. } \end{array} $$

We now can define when an interpretation \((\sigma _I,\sigma _O)\) of \(\varphi \) is an evaluation model, which gives denotational semantics to Lola specifications.

Definition 6

(Evaluation Model). An interpretation \((\sigma _I,\sigma _O)\) of \(\varphi \) is an evaluation model of \(\varphi \) whenever

$$ \llbracket y\rrbracket =\llbracket E_y\rrbracket \;\;\;\; \text {for every}\,\, y\in {}O $$

In this case we write \((\sigma _I,\sigma _O)\models \varphi \).

For a given set of input streams, a Lola specification may have zero, one, or multiple evaluation models.

Example 3

Consider the Lola specifications (all with \(I=\{x\}\) and \(O=\{y\}\)) where x has type Nat and y has type Bool.

$$ \begin{array}{lccl} \varphi _1: &{} y &{} := &{} (x \le 10) \\ \varphi _2: &{} y &{} := &{} y \wedge (x \le 10) \\ \varphi _3: &{} y &{} := &{} \lnot y \end{array} $$

For any given input stream \(\sigma _x\), \(\varphi _1\) has exactly one evaluation model \((\sigma _x,\sigma _y)\), where \(\sigma _y(i) = true \) if and only if \(\sigma _x(i) \le 10\), for \(1\le i\le N\). The specification \(\varphi _2\), however, may give rise to multiple evaluation models for a given input stream. For example, for input stream \(\sigma _x: \left\langle 0,15,7,18 \right\rangle \), both \(\sigma _y: \left\langle false , false , false , false \right\rangle \) and \(\sigma _y: \left\langle false , true , false , true \right\rangle \) make \((\sigma _x,\sigma _y)\) an evaluation model of \(\varphi _2\). The specification \(\varphi _3\), on the other hand, has no evaluation models, because there is no solution to the equations \(\sigma _y(i) = \lnot \sigma _y(i)\).    \(\square \)

2.3 Well-Definedness and Well-Formedness

SRV specifications are meant to define monitors, which intuitively correspond to queries of observations of the system under analysis (input streams) for which we want to compute a unique answer (the output streams). Therefore, the intention of a specification is to define a function from input streams to output streams, and this requires that there is a unique evaluation model for each instance of the input streams. The following definition captures this intuition.

Definition 7

(Well-defined). A Lola specification \(\varphi \) is well-defined if for any set of appropriately typed input streams \(\sigma _I\) of the same length \(N>0\), there exists a unique valuation \(\sigma _O\) of the defined streams such that \((\sigma _I,\sigma _O)\models \varphi \).

A well-defined Lola specification maps a set of input streams to a unique set of output streams. Unfortunately well-definedness is a semantic condition that is hard to check in general (even undecidable for rich types). Therefore, we define a more restrictive (syntactic) condition called well-formedness, that can be easily checked on every specification \(\varphi \) and implies well-definedness. We first add an auxiliary definition.

Definition 8

(Dependency Graph). Let \(\varphi \) be a Lola specification. The dependency graph for \(\varphi \) is the weighted directed multi-graph \(D= \left\langle V,E \right\rangle \), with vertex set \(V=I\cup O\). The set E contains an edge \(y\overset{0}{\rightarrow } v\) if v is occurs in \(E_y\) and an edge \(y\overset{k}{\rightarrow } v\) if v[kd] occurs in \(E_y\).

An edge \(y\overset{k}{\rightarrow } v\) encode that y at a particular position potentially depends on the value of v, offset by k positions. Note that there can be multiple edges between x and y with different weights on each edge. Also note that vertices that correspond to input variables do not have outgoing edges.

A walk of a graph is a sequence \(v_1\overset{k_1}{\rightarrow }v_2\overset{k_2}{\rightarrow }v_3\cdots {}v_n\overset{k_n}{\rightarrow }v_{n+1}\) of vertices and edges. A walk is closed if \(v_1=v_{n+1}\). The weight of a walk is the sum of the weights of its edges. A simple walk is a walk in which no vertex is repeated. A cycle is a simple closed walk.

Definition 9

(Well-Formed Specifications). A Lola specification \(\varphi \) is well-formed if its dependency graph has no closed walk with weight zero.

Example 4

Consider the Lola specification with \(I:\{x_1,x_2\}\) and \(O:\{y_1,y_2\}\) and the following defining equations:

$$ \begin{array}{rcl} y_1 &{}:=&{} y_2[1,0]+ \texttt {if} \;(y_2[-1,7] \le x_1[1,0])\;\texttt {then} \;(y_2[-1,0])\;\texttt {else} \;y_2\\ y_2 &{}:=&{} (y_1 + x_2[-2,1]).\\ \end{array} $$

Its normalized specification is

$$ \begin{array}{lclclclclcl} y_1 &{} := &{} y_{5} + y_{9} &{} \qquad \quad &{} y_2 &{} := &{} y_1 + y_{4} &{} \qquad \quad &{} y_{3} &{} := &{} x_1[1,0] \\ y_{4} &{} := &{} x_2[-2,1] &{}&{} y_{5} &{} := &{} y_2[1,0] &{}&{} y_{6} &{} := &{} y_2[-1,0] \\ y_{7} &{} := &{} y_2[-1,7] &{}&{} y_{8} &{} := &{} y_{7} \le y_{3} &{}&{} y_{9} &{} := &{} \texttt {if} \;y_{8}\;\texttt {then} \;y_{6}\;\texttt {else} \;y_2 \end{array} $$

The dependency graph of the normalized specifications is:

figure a

This specification has a zero-weight closed walk, namely \(y_1\xrightarrow {0}y_{9}\xrightarrow {0}y_2\xrightarrow {0}y_1\), and hence the specification is not well-formed.    \(\square \)

To prove that well-formedness implies well-definedness, we first define the notion of an evaluation graph which captures the dependencies for a given input length N.

Definition 10

(Evaluation Graph). Given a specification \(\varphi \) and a length N, the evaluation graph is the directed graph \(G_N: \left\langle V,E \right\rangle \) where V contains one vertex \(v^j\) for each position j of each stream variable v and

  • there is an edge \(y^j\rightarrow {}v^j\) if \(E_y\) contains v as an atom, and

  • there is an edge \(y^j\rightarrow {}v^{j+k}\) if \(E_y\) contains v[kc] and \(0\le j+k < N\).

The vertices \(v^j\) are called position variables as they encode the value of stream variable v at position j. We will prove later that a specification is guaranteed to be well-defined if no evaluation graph for any length contains a cycle, because in this case the value of each position variable can be uniquely determined. The following lemma relates this acyclicity notion with the absence of zero-weight cycles in the dependency graph.

Lemma 1

Let \(\varphi \) be a specification with dependency graph D, let N be a trace length and \(G_N\) the explicit dependency graph. If \(G_N\) has a cycle then D has a zero-weight closed walk.

Proof

Assume \(G_N\) has a cycle

$$ y_1^{j_1} \rightarrow y_2^{j_2} \rightarrow \cdots \rightarrow y_{k}^{j_k} \rightarrow y_{1}^{j_1}$$

The corresponding closed walk in D is

$$ y_{1} \overset{j_2-j_1}{\longrightarrow } y_{2} \rightarrow \cdots \rightarrow y_{k} \overset{j_1-j_k}{\longrightarrow } y_{1} $$

with weight \( \sum _{i=1}^k (j_{i\oplus 1} - j_i ) = 0\).    \(\square \)

Note that the closed walk induced in D needs not be a cycle since some of the intermediate nodes may be repeated, if they correspond to the same \(y_k\) for different position j.

Lemma 2

Let \(\varphi \) be a specification and N a length. If \(G_N\) has no cycles, then for every tuple \(\sigma _I\) of input streams of length N, there is a unique evaluation model.

Proof

Assume \(G_N\) has no cycles, so \(G_N\) is a DAG. Then we can define a topological order > on \(G_N\) by taking the transitive closure of \(\rightarrow \). We prove by induction on this order that the value of each vertex is uniquely determined, either because this value is obtained directly from an input stream or constant value in the specification, or because the value can be computed from values computed before according to >.

For the base case, the value of a vertex \(v^j\) without outgoing edges does not depend on other streams. The only possible value is either the value of an input stream (if v is an input stream variable) at position j, or a value obtained from an equation with no variables or offsets as atoms. In all these cases the value is uniquely determined.

For the inductive case, the value of \(v^j\) can be computed uniquely from the values of its adjacent vertices in \(G_N\). Indeed, by Definition 10, if the value of \(v^j\) depends on the value of \(v^k\) then there exists an edge \(v^j\rightarrow {}v^k\) in \(G_N\) and thus \(v^j>v^k\) and, by the inductive hypothesis, the value of \(v^k\) is uniquely determined. Then, since every atom in \(\llbracket E_v\rrbracket (j)\) is uniquely determined, the value of \(\llbracket E_v\rrbracket (j)\) is uniquely determined. Since this value has been computed only from inputs, this is the only possible value for \(\sigma _v(j)\) to form an evaluation model.    \(\square \)

Consider now a well-formed specification \(\varphi \). Then, by Lemma 1, no evaluation graph has cycles, and thus by Lemma 2 for every set of input streams, there exists a unique solution for the output streams, and hence there is exactly one evaluation model.

Theorem 1

Every well-formed Lola specification is well-defined.

Note that the converse of Theorem 1 does not hold. First, the absence of cycles in \(G_N\) does not imply the absence of a zero-weight closed walk in D. For example, the evaluation graph for the specification

$$ \begin{array}{lcl} y_1 &{} := &{} y_2[-k,c] \\ y_2 &{} := &{} y_1[k,c] \end{array} $$

for \(N<k\) has no cycles (since it has no edges), but it is easy to see that D has a zero-weight closed walk. Second, a cycle in \(G_N\) does not necessarily imply that \(\varphi \) is not well-defined. For example, the evaluation graph of the specification

$$ \begin{array}{lcl} y &{} := &{} (z \ \vee \ \lnot z) \ \wedge \ x \\ z &{} := &{} y \end{array} $$

has a cycle for all N, but for every input stream, \(\varphi \) has exactly one evaluation model, namely \(\sigma _y=\sigma _z=\sigma _x\), and thus, by definition, the specification is well-defined.

2.4 Checking Well-Formedness

A Lola specification \(\varphi \) is well-formed if its dependency graph D has no closed walks, so checking well-formedness is reduced to construct D and check for closed walks. In turn, this can be reduced to checking for cycles as follows.

Let a gez-cycle be a cycle in which the sum of the weight of the edges is greater than or equal to zero, and let a gz-cycle be a cycle in which the sum of the weight of the edges is strictly greater than zero. Similarly, a lez-cycle is a cycle where the sum is less than or equal zero and a lz-cycle is one where the sum is less than zero. The reduction is based on the observation that a graph has a zero-weight closed walk if and only if it has a maximally strongly component (MSCC) with both a gez-cycle and a lez-cycle.

Lemma 3

A weighted and directed multigraph D has a zero-weight closed walk if and only if it has a vertex v that lies on both a gez-cycle and a lez-cycle.

Proof

(\(\Rightarrow \)) Assume v is part of gez-cycle \(C_1\) and lez-cycle \(C_2\), with weights \(w_1\ge 0\) and \(w_2\le 0\), respectively. The closed walk consisting of traversing \(w_1\) times \(C_2\) and then traversing \(|w_2|\) times \(C_1\) has weight \(w_1w_2 + |w_2|w_1 = 0\), as desired.

(\(\Leftarrow \)) Assume D has a zero-weight closed walk. If D has a zero-weight cycle C we are done, as C is both a gez-cycle and a lez-cycle and any vertex in C has the desired property.

For the other case, assume D has no zero-weight cycles. It is easy to show by induction in the length of W that every closed walk can be decomposed into cycles that share one vertex. If one of these cycles is a lez-cycle or a gez-cycle the result follows. Now, not all the cycles can be strictly positive, because then the total weight of W would not be zero. Consequently there must a positive cycle and a negative cycle, and therefore there must be two consecutive cycles \(C_1\) and \(C_2\) that share one node and \(C_1\) is positive and \(C_2\) is negative.    \(\square \)

Theorem 2

A directed weighted multigraph D has no zero-weight closed walk if and only if every MSCC has only gz-cycles or only lz-cycles.

Proof

(\(\Rightarrow \)) Consider an arbitrary MSCC with only gz-cycles (the case for only lz-cycles is analogous). By the proof of Lemma 3, a closed walk is the multiset union of one or more cycles with weight the sum of the weights of the cycles. Hence the weight of any closed walk within the MSCC must be strictly greater than zero. Since any closed walk must stay within an MSCC, the weight of any closed walk must be strictly greater than zero.

(\(\Leftarrow \)) Assume D has no zero-weight closed walk. Then, by Lemma 3, D has no vertex that lies on both a gez-cycle and a lez-cycle. Suppose D has an MSCC with a gz-cycle \(C_1\) and a lz-cycle \(C_2\). Consider an arbitrary vertex \(v_1\) on \(C_1\) and \(v_2\) on \(C_2\). If \(v_1=v_2=v\), v lies on both a gez-cycle and a lez-cycle, a contradiction. If \(v_1\not = v_2\), since \(v_1\) and \(v_2\) are in the same MSCC, there exists a cycle \(C_3\) that contains both \(v_1\) and \(v_2\). \(C_3\) is either a zero-weight cycle, a gz-cycle or a lz-cycle. In all three cases either \(v_1\) or \(v_2\) or both lie on both a gez-cycle and a lez-cycle, a contradiction.    \(\square \)

Thus to check well-formedness of a SRV specification \(\varphi \) it is sufficient to check that each MSCC in G has only gz-cycles or only lz-cycles. This can be checked efficiently, even for large dependency graphs.

3 Online Monitoring

We distinguish two situations for monitoring—online and offline monitoring. In online monitoring, the traces from the system under observation are received as the system run, and the monitor works in tandem with the system. This leads to the following restriction for online monitoring: the traces are available a few points at a time starting at the initial instant on-wards, and need to be processed to make way for more incoming data. In particular, random access to the traces is not available. The length of the trace is assumed to be unknown upfront and very large.

In offline monitoring, on the other hand, we assume that the system has run to completion and the trace of data has been dumped to a storage device. Offline monitoring is covered in Sect. 4.

3.1 Monitoring Algorithm

We start by exhibiting a general monitoring algorithm for arbitrary Lola specifications, and then study its efficiency. Let \(\varphi \) be a Lola specification with independent stream variables I, dependent stream variables O and defining expressions E. Let j be the current position, at which the latest data is available from all input streams. The monitoring algorithm maintains two sets of equations as storage:

  • Resolved equations R of the form \((v^k,c)\) for a given position variable \(v^k\) (with \(k\in \{1,\ldots ,j\}\)) and concrete value c.

  • Unresolved equations U of the form \((y^k, e)\) for position variable \(y^k\) expression e (for e different from a constant).

An equation \((v^k,c)\) stored in R denotes that stream variable v at position k in the trace has been determined to have value c. This happens in two cases: input streams whose reading has been performed, and dependent stream variables whose value has been computed. Equations in U relate position variables \(y^k\)—where y is a dependent stream variable—with a (possibly partially simplified) expression over position variables whose values have not yet been determined. Note that if \((y^k,e)\) is in U then e must necessarily contain at least one position variable, because otherwise e is a ground expression and the interpreted functions from the domain can transform e into a value.

The monitoring algorithm is shown in Algorithm 1. After initializing the U and R stores to empty and j to 0, the monitoring algorithm executes repeatedly the main loop (lines 5 to 11). This main loop first reads values for all inputs at the current position and adds these values to R (line 6). Then, it instantiates the defining equations for all outputs and adds these to U (line 7). Finally, it propagates new known values \((v^k,c)\) in R by substituting all occurrences of \(v^k\) in unresolved equations by c and then simplifies resulting equations (procedure Propagate). This procedure simply uses all the information in R to substitute occurrences of known values in unresolved equations. In some cases, these equations become resolved (the term becomes a value) and the corresponding pair is moved to R (lines 23 and 24). Then, the procedure Prune is used to eliminate unnecessary information from R as described below. Finally, procedure Finalize is invoked at the end of the trace. This procedure is used to determine whether a given offset expression that remains in an unresolved equation falls beyond the end of the trace, which is converted into its default value. This procedure also performs a final call to Propagate, which is guaranteed (see below) to resolve all position variables, and therefore U becomes empty.

figure b

Procedure Inst, shown in Algorithm 2, instantiates the defining equation for v into the corresponding equation for \(v^j\) at given position j by propagating the value into the atomic stream variable references and offsets atoms, which become instance variables. Note that the default value c is recorded in line 57 in case the computed position \(k+j\) falls beyond the end of the trace N, which is not known at the point of the instantiation. Whether \(k+j\) is inside the trace will be determined after k steps or resolved by Finalize.

figure c

We show now how the resolved storage R can be pruned by removing information that is no longer necessary. The back reference distance of a stream variable represents the maximum time steps that its value needs to be remembered.

Definition 11

(Back Reference Distance). Given a specification \(\varphi \) with dependency graph D the back reference distance \(\nabla v\) of a vertex v is

$$ \nabla v = max(0, \left\{ k\ | \; s\overset{-k}{\rightarrow }v\in E \right\} ) $$

Example 5

We illustrate the use of back reference distances for pruning R (lines 31 and 32) revisiting Example 4. The back reference distances are \(\nabla y_1 = \nabla y_{10} = \nabla y_{11} = \nabla y_{12} = \nabla y_{13} = \nabla y_{14} = \nabla y_{15} = \nabla y_{16} = \nabla x_1 = 0\) and \(\nabla y_2 = \nabla x_2 = 2\). Consequently, all equations \((v^j,c)\) are removed from R in the same time step that they are entered in R, except for \(y_2^j\) and \(x_2^j\), which must remain in R for two time steps until instant \(j+2\).    \(\square \)

Example 6

Consider the following specification

which computes \(p\mathbin {\mathcal {U}}q\). For input streams \(\sigma _p : \left\langle false , false , true , false \right\rangle \) and \(\sigma _q : \left\langle true , false , false , false \right\rangle \) the equations in stores R and U at the completion of step (3) of the algorithm at each position are:

Since the back reference distance of all stream variables is 0, all equations can be removed from R at each position.    \(\square \)

Theorem 3

(Correctness). Let \(\varphi \) be a specification and \(\sigma _I\) be input streams of length N. If \(\varphi \) is well-formed, then Algorithm 1 computes the unique evaluation model of \(\varphi \) for \(\sigma _I\). That is, at the end of the trace the unique value has been computed for each \(y^k\), and U is empty.

Proof

Assume \(\varphi \) is well-formed. By Definition 9 the dependency graph D has no zero-weight closed walks and hence by Lemma 1, the evaluation graph \(G_N\) has no cycles, and we can define a topological order < in \(G_N\).

As in the proof of Lemma 1, every vertex of \(G_N\) can be mapped to the corresponding value of the unique evaluation model. We prove by induction on \(G_N\) that at the end of the trace each of these values has been computed and that each value has been available in R at some point \(j< N\) during the computation.

For the base case, leaf vertices \(v^j\) correspond to either input stream variables or values from equations of the form \(x=c\) or \(x=y[k,c]\) such that \(j+k< 0\). In both cases the value is uniquely obtained and the corresponding equation is added to R.

For the inductive case, the value for vertex \(v^j\) is uniquely computed from the values for vertices \(w^k\) such that \(v^j\rightarrow w^k\), and hence \(w^k<v^j\) and by the inductive hypothesis, the value for \(w^k\) is uniquely computed or obtained and is at some point available in R. It remains to be shown that these values are available in R for substitution. We distinguish three cases:

  1. 1.

    \(j=k\). In this case \((v^j,e)\) and \((w^k,e')\) are added to U (or R) at position j (either in line 6 or in line 7). If \((w^k,e')\) is added to R, the value of \(w^k\) in e is substituted in \(e'\) in line 19. If \((w^k,e')\) is added to U, by the inductive hypothesis, it is available at some later point in the computation. Then it must be moved to R in line 23, and hence in the same step it is substituted in e.

  2. 2.

    \(j<k\). In this case \((w^k,e')\) is added to U (or R) after \((v^j,e)\) is added to U. Again, by the inductive hypothesis, \((w^k,c)\) will be resolved and become available in R at some position \(l<N\) and thus at that same position is substituted in e if \((v^j,e)\) is still in U.

  3. 3.

    \(j>k\). In this case \(E_v\) contains w[ic] and thus \(k=j+i\) (i.e. \(i<0\)). Now, \((w^k,c)\) is added to R or U before \((v^j,e)\) is added to U. Again, by the inductive hypothesis, \(w^k\) will be resolved at some position \(l\le N\), which must be after k. By the definition of k, \((w^k,c)\) will be in R at least until \(k+\nabla w\) which is guaranteed to be at j or after and hence be available when \(v^j\) is added to U.

This finishes the proof.    \(\square \)

3.2 Efficiently Monitorable Specifications

In the general case the algorithm Monitor described above is linear in both time and space in the length of the trace and the size of the specification. In these bounds, we assume that the value of a type can be stored in a single register of the type, and that a single function is computed in a single step.

In online monitoring, since the traces are assumed to be large, it is generally assumed that a specification can be monitored efficiently only if the memory requirements are independent of the trace length.

Example 7

Consider the following specification with \(I=\{x\}\) and \(O=\{y,\textit{last},w,z\}\):

$$ \begin{array}{ll} \begin{array}{lcl} y &{} := &{} false \\ \textit{last}&{} := &{} y[1, true ] \\ w &{} := &{} z[1,0] \\ z &{} := &{} \texttt {if} \;\textit{last}\;\texttt {then} \;x\;\texttt {else} \;w \end{array} \end{array} $$

For the input stream \(\sigma _x\ \left\langle 37,31,79,17,14 \right\rangle \) the unique evaluation model is

In general, for any input stream \(\sigma _x\), output stream \(\sigma _z\) has all its values equal to the last value of \(\sigma _x\). However, for all j, equations

$$ (w^j,z_0^{j+1}) \;\;\;\text { and }\;\;\; (z^j,\texttt {if} \;\textit{last}^j\;\texttt {then} \;x^j\;\texttt {else} \;w^j) $$

remain unresolved until the end of the trace, and thus the memory requirements of Algorithm 1 for this specification are linear in the length of the trace.    \(\square \)

The worst-case memory usage of a Lola specification for a given trace length can be derived from the evaluation graph with the aid of the following definitions.

Definition 12

(Fan and Latency). The fan of a vertex \(v^j\) of an evaluation graph \(G_N\) is the set of vertices reachable \(v^j\):

$$ \textit{fan}(v^j) \,{\mathop {=}\limits ^{\text {def}}}\, \left\{ w^k \ | \ v^j\rightarrow ^* w^k \right\} $$

The latency of a position variable \(v^j\) is the difference between j and the position of the furthest vertex in \(\textit{fan}(v^j)\):

$$ \textit{lat}(v^j) \,{\mathop {=}\limits ^{\text {def}}}\, max \left( 0, \left\{ k \ | \ w^{j+k}\in \textit{fan}(v^j) \right\} \right) . $$

The fan of \(v^j\) is an over-approximation of the set of vertices on which the value of \(v^j\) depends. The latency is an upper-bound on the number of trace steps it takes before a value at a given position is guaranteed to be resolved.

Theorem 4

If a vertex \(v^j\) has latency k, then the corresponding equation \((v^j,e)\) will be fully resolved by Monitor at or before step \(j+k\).

Proof

Since the specification is well-formed the evaluation graph is acyclic. We show the results by induction on a topological order < of the evaluation graph. Note that if \(v^j\rightarrow w^i\) then \(\textit{lat}(v^j)\ge \textit{lat}(w^i)\) directly by the definition of latency. Then, at position \(j+k\) it is guaranteed that \(w^i\) is resolved. Since all atoms in the expression e of equation \((v^j,e)\) are resolved at \(j+k\) or before, the corresponding values are substituted in e (line 19) at step \(j+k\) or before, so e is simplified into a value at \(j+k\) or before.    \(\square \)

Example 8

Consider again the specification of Example 7. The latency of \(z^2\) is \(N-2\), so equations for \(z^2\) may reside in U for \(N-2\) steps, so this specification cannot be monitored online in a trace-length independent manner.    \(\square \)

Definition 13 (Efficiently Monitorable)

A Lola specification is efficiently monitorable if the worst case memory usage of Monitor is independent of the length of the trace.

Some specifications that are not efficiently monitorable may be rewritten into equivalent efficiently monitorable form, as illustrated by the following example.

Example 9

Consider the specification “Every request must be eventually followed by a grant before the trace ends”, expressed as \(\varphi _1\) as follows:

$$ \begin{array}{lcl} reqgrant &{} := &{} \texttt {if} \; request \;\texttt {then} \; evgrant \;\texttt {else} \; true \\ evgrant &{} := &{} grant \ \mathrel {\vee } nextgrant \\ nextgrant &{} := &{} evgrant [1, false ] \end{array} $$

This specification encodes the temporal assertion . Essentially, \( evgrant \) captures and \( reqgrant \) corresponds to (see [12] and [10] for a description of translation from LTL to Boolean SRV). An alternative specification \(\varphi _2\) of the same property is

It is easy to see that, for the same input, is true at the end of the trace (for \(\varphi _2\)) if and only if \(\lnot nextgrant \) is true at the beginning of the trace for \(\varphi _1\). Hence, both specifications can report a violation at the end of the trace if a request was not granted. The second specification, however, is efficiently monitorable, while the first one is not.    \(\square \)

Similar to the notion of well-definedness, checking whether a specification is efficiently monitorable is a semantic condition and cannot be checked easily in general. Therefore we define a syntactic condition based on the dependency graph of a specification that guarantees that a specification is efficiently monitorable.

Definition 14

(Future Bounded). A well-formed specification \(\varphi \) is future bounded if its dependency graph D has no positive-weight cycles.

We show that every future bounded specification is efficiently monitorable by showing that in the absence of positive-weight cycles every vertex in the dependency graph can be mapped to a non-negative integer that provides an upper-bound on the number of trace steps required to resolve the equation for the corresponding instance variable.

Definition 15

(Look-ahead Distance). Given a future bounded specification with dependency graph D, the look-ahead distance \(\varDelta v\) of a vertex v is the maximum weight of a walk starting from v (or zero if the maximum weight is negative).

Note that the look-ahead distance is well defined only in the absence of positive-weight cycles. The look-ahead distance of a vertex can be computed easily using shortest path traversals on the dependency graph D.

Example 10

Consider the specification

The dependency graph D of this specification is:

figure g

Consequently, the values of the look-ahead distance are:

$$ \begin{array}{lll} \begin{array}{lcl} \varDelta y_1 &{} = &{} 1 \\ \varDelta y_2 &{} = &{} 3 \\ \varDelta y_3 &{} = &{} 7 \end{array} \qquad \quad &{} \begin{array}{lcl} \varDelta y_4 &{} = &{} 1 \\ \varDelta y_5 &{} = &{} 0 \\ \varDelta y_6 &{} = &{} 3 \end{array} \qquad \quad &{} \begin{array}{lcl} \varDelta y_7 &{} = &{} 2 \\ \varDelta y_8 &{} = &{} 0 \\ \varDelta y_9 &{} = &{} 7 \end{array} \end{array} $$

which are easily computer from D.   \(\square \)

The look-ahead distance provides an upper-bound on the number of equations that may simultaneously be in U.

Lemma 4

For every vertex \(v^j\) in an evaluation graph \(G_N\) of a future bounded specification \(\textit{lat}(v^j) \le \varDelta c\).

Proof

Consider a vertex \(v^j\) in an evaluation graph, with latency \(\textit{lat}(v,j)=d\). Then, there exists a sequence of vertices

$$ v^j \rightarrow y_1^{j_1} \rightarrow \ldots \rightarrow y_n^{j_n} $$

with \(j_n-j=d\). The walk in the dependency graph of the corresponding vertices

$$ v \overset{j_1-j}{\longrightarrow }y_1\overset{j_2-j_1}{\longrightarrow }\ldots \overset{j_n-j_{n-1}}{\longrightarrow }y_n $$

has total weight

$$ \sum _{i=1}^n j_{i+1}-j_i\ =\ j_n - j_i\ =\ d $$

and hence \(\textit{lat}(v^j)\le \varDelta v\).    \(\square \)

Theorem 5

(Memory Requirements). Let \(\varphi \) be a future bounded specification. Algorithm 1 requires to store in U and R, at any point in time, a number of equations linear in the size of \(\varphi \).

Proof

From the description of the algorithm and Lemma 4 it follows that the maximum number of equations in U is less than or equal to

$$ \sum _{y\in O} \varDelta y \ + |O| $$

where the second term reflects that all equations for the dependent variables are first stored in U in line 7 and after simplification moved to R in line 23.

Moreover, the maximum number of equations stored in R is bounded by \(\nabla v\) and the number of stream variables v.    \(\square \)

Example 11

Consider again the specification of Example 10. The back reference distance is 0 for all variables except for \(x_2\) and \(y_3\), which are \(\nabla x_2 = 1, \nabla y_3=7\). Hence, at the end of every main loop, R only contains one instance of \(x_2\) and seven instances of \(y_3\). Additionally, the look-ahead distance of a stream variable v bounds linearly the number of instances of v in U.    \(\square \)

Corollary 1

Every future-bounded specification is efficiently monitorable.

Note that the converse does not hold. In practice, it is usually possible to rewrite an online monitoring specification with a positive cycle into one without positive cycles, as illustrated in Example 9.

4 Offline Monitoring

In offline monitoring we assume that all trace data is available on tape, and therefore we can afford more flexibility in accessing the data. In this section we show that every well-formed SRV specification can be monitored efficiently offline, in contrast to online monitoring where we required that the dependency graph not have any positive-weight cycles. The reason why we can efficiently monitor in an offline manner all specifications is that we can perform both forward and backward passes over the trace. We will show that every well-formed specification can be decomposed into sub-specifications such that each sub-specification needs to be checked only once and can be done so efficiently by either traversing the trace in a forward or in a backward direction. In this manner, all values of the output streams of a sub-specification can written to tape and are accessible for subsequent traversals.

We first define the notions of reverse efficiently monitorable and its corresponding syntactic condition, past bounded, as the duals of efficiently monitorable and future bounded. A reverse monitoring algorithm RevMonitor can be easily obtained by initializing j to N (line 4) decreasing j (line 10), pruning j on the dual of the back-reference distance in line 31 and performing substitutions when the offset becomes negative (so Finalize is not necessary for reverse monitoring). This is essentially the same algorithm as Monitor but performing the index transformation \(j'=N-(j+1)\).

Definition 16

(Reverse Efficiently Monitorable). A Lola specification is reverse efficiently monitorable if its worst-case memory requirement when applying RevMonitor is independent of the length of the trace.

Definition 17

(Past Bounded). A well-formed Lola specification is past bounded if its dependency graph has no negative-weight cycles.

Lemma 5

Every past-bounded specification is reverse efficiently monitorable.

Proof

The dual of the argument for Corollary 1.   \(\square \)

We construct now an offline algorithm that can check a well-formed Lola specification in a sequence of forward and reverse passes over the tapes, such that the number of passes is linear in the size of the specification and each pass is trace-length independent.

Let \(\varphi \) be a well-formed specification with dependency graph D. From the definition of well-formedness it follows that D has no zero-weight cycles, so each MSCC consists of only negative-weight or only positive-weight cycles. Let \(G_M:\ \left\langle \left\{ V_p,V_n \right\} ,E_M \right\rangle \) be the graph induced by the MSCCs of D defined as follows. For each positive-weight MSCC in D there is a vertex in \(V_p\) and for each negative-weight MSCC in D there is a vertex in \(V_n\). For each edge between two MSCCs there is an edge in \(E_M\) connecting the corresponding vertices. Clearly, \(G_M\) is a DAG.

Now we assign each MSCC a stage that will determine the order of computing the output for each MSCC following the topological order of \(G_M\). Positive MSCCs will be assigned even numbers and negative MSCCs will be assigned odd numbers. Every MSCC will be assigned the lowest stage possible that is higher than that of all its descendants with opposite polarity. In other words, the stage of an MSCC v is at least the number of alternations in a path in \(G_M\) from v.

Formally, let the opposite descendants be defined as follows:

$$ \textit{op}(v)=\{ v' \;|\; (v,v')\in E_M^* \text { and } (v\in V_p \text { and } v'\in V_n\text {, or } v\in V_n\text { and } v'\in V_p) \} $$

Then,

$$ \textit{stage}(v)={\left\{ \begin{array}{ll} 0 &{} \text {if}\,\, \textit{op}(v)\,\, \text {is empty and}\,\, v\in V_n \\ 1 &{} \text {if}\,\, \textit{op}(v)\,\, \text {is empty and}\,\, v\in V_p \\ 1+ max \{\textit{stage}(v') \;|\; v'\in \textit{op}(v)\} &{} \text {otherwise} \end{array}\right. } $$

which can be computed following a topological order on \(G_M\). Each vertex v in \(G_M\) can be viewed as representing a sub-specification \(\varphi _v\) whose defining equations refer only to stream variables in sub-specifications with equal or lower stage processing order. Based on this processing order we construct the following algorithm.

figure h

Theorem 6

Given a well-formed specification, a trace can be monitored in time linear in the size of the specification and the length of the trace, with memory requirements linear in the size of the specification and independent of the length of the trace.

Proof

Follows directly from Lemmas 1 and 5 and Algorithm 3.    \(\square \)

Example 12

Figure 1 shows the dependency graph of a Lola specification and its decomposition into MSCCs, along with its induced graph \(G_M\) annotated with the processing order of the vertices. MSCCs \(G_1\) and \(G_4\) are positive, while \(G_2\) and \(G_5\) are negative. \(G_3\) is a single node MSCC with no edges, which can be chosen to be either positive or negative. The passes are: \(G_5\) is first monitored forward because it is efficiently monitorable. Then, \(G_4\) is monitored backwards. After that, \(G_3\) and \(G_2\) are monitored forward. Finally, \(G_1\) is monitored backwards.

Fig. 1.
figure 1

A dependency graph G and its MSCC induced graph \(G_M\).

5 Conclusions

We have revisited Stream Runtime Verification, a formalism for runtime verification based on expressing the functional relation between input streams and output streams, and we have presented in detail evaluation strategies for online and offline monitoring.

SRV allows both runtime verification of temporal specifications and collection of statistical measures that estimate coverage and specify complex temporal patterns. The Lola specification language is sufficiently expressive to specify properties of interest to applications like large scale testing, and engineers find the language easy to use. Even specifications with more than 200 variables could be constructed and understood relatively easily by engineers. Even though the language allows ill-defined specifications, SRV provides a syntactic condition that is easy to check and that guarantees well-definedness, using the notion of a dependency graph. Dependency graphs are also used to check whether a specification is efficiently monitorable online, that is, in space independent of the trace length. In practical applications most specifications of interest are in fact efficiently monitorable or can be rewritten into an efficiently monitorable fashion. We revisited the online algorithm for Lola specifications, and presented an algorithm for offline monitoring whose memory requirements are independent of the trace length for any well-formed specification.

The design of the Lola specification language was governed by ease of use by engineers. In runtime verification, unlike in static verification, one is free to choose Turing-complete specification languages. As a result researchers have explored the entire spectrum from temporal logics to programming languages. The advantage of a programming language in comparison with a temporal logic is that a declarative programming language is more familiar to engineers and large specifications are easier to write and understand. The disadvantage is that the semantics is usually tied to the evaluation strategy (typically in an informal implicit manner) and the complexity is hard to determine, while the semantics of a logic is independent of the evaluation strategy and upper bounds for its complexity are known. In practice, the choice is motivated by the intended use. Stream Runtime Verification is usually employed to facilitate the task of writing large specifications for engineers, so the natural choice was a programming-language. SRV retains, however, most of the advantages of a logic: the semantics is independent of the evaluation strategy and the efficiently monitorable specifications provide a clear bound on complexity. See [10] where we study decision procedures and complexities of decision problems for Boolean Stream Runtime Verification. For example, comparing Lola with specification languages at the other end of the spectrum, such as Eagle [3], Lola usually allows simpler specifications, as illustrated in Fig. 2.

Fig. 2.
figure 2

Comparison between the Lola and Eagle specification language