Keywords

1 Introduction

Model checking [9, 33] consists in proving that a model of a given program/computer system satisfies a temporal specificationFootnote 1. Traditionally, the model of the given program/computer system is a transition system and its semantics is the set of traces generated by the transition system. The temporal specification is usually one of the many variants of temporal logics such as the Linear Time Temporal logic (LTL), the Computation Tree Logic (CTL), or the combination CTL\({}^{*}\) of the two. The semantics of the temporal specification is a set of traces. The problem is therefore to check that the set of traces of the semantics of the given program/computer system is included in the set of traces of the semantics of the temporal specification. This is a Galois connection-based abstraction and so a model checking algorithm can be designed by calculus. To show that we consider a non-conventional temporal specification using regular expressions [25] and a structural fixpoint prefix-closed trace semantics which differs from the traditional small-step operational semantics specified by a transition system. There are properties of traces that are not expressible in temporal logic but are easily expressible using regular expressions [36].

2 Syntax and Trace Semantics of the Programming Language

Syntax Programs are a subset of C with the following context-free syntax.

figure a

A break exits the closest enclosing loop, if none this is a syntactic error. If is a program then is a valid C program. We call “[program] component” either a statement, a statement list, or a program. We let be the syntactic relation between immediate syntactic components. For example, if then , and .

Program labels Labels are not part of the language, but useful to discuss program points reached during execution. For each program component , we define

figure j

Prefix trace semantics Prefix traces are non-empty finite sequences of states where states are pairs of a program label designating the next action to be executed in the program and an environment assigning values to variables . A trace can be finite or infinite (recording a non-terminating computation) so . Trace concatenation is defined as follows

figure v

In pattern matching, we sometimes need the empty trace . For example if then and so .

Formal definition of the prefix trace semantics The prefix trace semantics is given structurally (by induction on the syntax) using fixpoints for the iteration.

  • The prefix traces of an assignment statement (where ) either stops in an initial state or is this initial state followed by the next state recording the assignment of the value of the arithmetic expression to variable x when reaching the label after the assignment.

    figure aj

    The value of an arithmetic expression in environment is :

    figure an
  • The prefix trace semantics of a break statement either stops at or goes on to the break label (which is defined as the exit label of the closest enclosing iteration).

    figure ar
  • The prefix trace semantics of a conditional statement is

  • either the trace when the observation of the execution stops on entry of the program component for initial environment ;

  • or, when the value of the boolean expression for is false , the initial state followed by the state at the label after the conditional statement;

  • or finally, when the value of the boolean expression for is true , the initial state followed by a prefix trace of starting in environment (and possibly ending .

figure bj

Observe that definition (4) includes the case of a conditional within an iteration and containing a break statement in the true branch . Since , from and , we infer that .

  • The prefix traces of the prefix trace semantics of a non-empty statement list are the prefix traces of or the finite maximal traces of followed by a prefix trace of .

    figure bu

    Notice that if then the last state of must be the first state of and this state is and so the trace must be a maximal terminating execution of i.e. is executed only if terminates.

  • The prefix finite trace semantic definition (6) of an iteration statement of the form where is the \(\subseteq \)-least solution to the equation . Since is \(\subseteq \)- monotone (if then and is a complete lattice, exists by Tarski’s fixpoint theorem and can be defined as the limit of iterates [15]. In definition (6) of the transformer , case (6.a) corresponds to a loop execution observation stopping on entry, (6.b) corresponds to an observation of a loop exiting after 0 or more iterations, and (6.c) corresponds to a loop execution observation that stops anywhere in the body after 0 or more iterations. This last case covers the case of an iteration terminated by a break statement (to after the iteration statement).Footnote 2

    figure cq
  • The other cases are similar.

Semantic properties As usual in abstract interpretation [16], we represent properties of entities in a universe by a subset of this universe. So a property of elements of belongs to . For example “to be a natural” is the property of the integers . The property “n is a natural” is “ ”. By program component (safety) property, we understand a property of their prefix trace semantics . So program properties belong to . The collecting semantics is the strongest program property, that is the singleton .

3 Specifying computations by regular expressions

Stephen Cole Kleene introduced regular expressions and finite automata to specify execution traces (called events) of automata (called nerve nets) [25]. Kleene proved in [25] that regular expressions and (non-deterministic) finite automata can describe exactly the same classes of languages (see [34, Ch. 1, Sect. 4]). He noted that not all computable execution traces of nerve nets can be (exactly) represented by a regular expression. The situation is the same for programs for which regular expressions (or equivalently finite automata) can specify a super-set of the prefix state trace semantics of program components .

Example 1 (Security monitors)

An example is Fred Schneider’s security monitors [29, 35] using a finite automata specification to state requirements of hardware or software systems. They have been used to check for safety program properties at runtime, that is for any given execution trace in the semantics . The safety property specified by the finite automaton or, equivalently, an regular expression is temporal i.e. links events occurring at different times in the computation (such as a file must be opened before being accessed and must eventually be closed).    \(\square \)

Syntax of regular expressions Classical regular expressions denote sets of strings using constants (empty string , literal characters a, b, etc.) and operator symbols (concatenation , alternation , repetition zero or more times or one or more times ). We replace the literal characters by invariant specifications stating that boolean expression should be true whenever control reaches any program point in the set of program labels. The boolean expression may depend on program variables and their initial values denoted where .

figure dp

We use abbreviations to designate sets of labels such as so that is invariant, so that is invariant at program label , when holds everywhere but at program point , etc.

Example 2

holds for any program. states that the value of x is always positive or zero during program execution. states that the value of x is always greater than or equal to its initial value during execution. states that the value of x should be positive or zero and if program point is ever reached then x should be 0, and if computations go on after program point then x should be negative afterwards.    \(\square \)

Example 3

Continuing Ex. 1 for security monitors, the basic regular expressions are names of program actions. We can understand such an action as designating the set of labels of all its occurrences in the program. If necessary, the boolean expression can be used to specify the parameters of the action.    \(\square \)

There are many regular expressions denoting the language containing only the empty sequence (such that , etc.), as shown by the following grammar.

figure em

For specification we use only non-empty regular expressions since traces cannot be empty.

figure eo

We also have to consider regular expressions containing no alternative .

figure er

Relational semantics of regular expressions The semantics (2) of expressions is changed as follows denotes the initial values of variables and their current value, is the alternative denial logical operation)

figure ex

We represent a non-empty finite sequence of states by a map (which is the empty sequence when \(n=0\)).

The relational semantics of regular expressions relates an arbitrary initial environment to a trace by defining how the states of the trace are related to that initial environment .

figure fi

Example 4

The semantics of the regular expression is .    \(\square \)

4 Definition of regular model checking

Let the prefix closure of a set of stateful traces be

figure fn

The following Def. 1 defines the model checking problem as checking that the semantics of the given program meets the regular specification for the initial environment Footnote 3.

figure fs

The prefix closure allows the regular specification to specify traces satisfying a prefix of the specification only, as in . The extension of the specification by allows for the regular specification to specify only a prefix of the traces, as in . Model checking is a boolean abstraction .

5 Properties of regular expressions

Equivalence of regular expressions We say that regular expressions are equivalent when they have the same semantics i.e. .

Disjunctive normal form of regular expressions As noticed by Kleene [25, p. 14], regular expressions can be put in the equivalent disjunctive normal form of Hilbert—Ackermann. A regular expression is in disjunctive normal form if it is of the form for some \(n\geqslant 1\), in which none of the , for \(1 \leqslant i \leqslant n\), contains an occurrence of . Any regular expression has a disjunctive normal form defined as follows.

figure gj

The Lem. 1 below shows that normalization leaves the semantics unchanged. It uses the fact that where the and do not contain any [23, Sect. 3.4.6, p.118]. It shows that normalization in (10) can be further simplified by and which have equivalent semantics.

Lemma 1

figure gq

.

and of regular expressions Janusz Brzozowski [7] introduced the notion of derivation for regular expressions (extended with arbitrary Boolean operations). The derivative of a regular expression with respect to a symbol a, typically denoted as or , is a regular expression given by a simple recursive definition on the syntactic structure of . The crucial property of these derivatives is that a string of the form \(a\sigma \) (starting with the symbol a) matches an expression iff the suffix \(\sigma \) matches the derivative [2, 7, 32].

Following this idea, assume that a non-empty regular expression has been decomposed into disjunctive normal form for some \(n\geqslant 1\), in which none of the , for contains an occurrence of . We can further decompose each into such that

  •   recognizes the first state of sequences of states recognized by ;

  • the regular expression recognizes sequences of states after the first state of sequences of states recognized by .

We define for non-empty -free regular expressions by structural induction, as follows.

figure hn

The following Lem. 2 shows the equivalence of an alternative-free regular expression and its first and next decomposition.

Lemma 2

Let be a non-empty -free regular expression and , . Then is -free and .

6 The model checking abstraction

The model checking abstraction in Section 4 is impractical for structural model checking since e.g. when checking that a trace concatenation of a statement list for a specification where is a trace of and is a trace of , we first check that satisfies and then we must check for a continuation of which should be derived from and . This is not provided by the boolean abstraction which needs to be refined as shown below.

Example 5

Assume we want to check for the regular specification by first checking the first statement and then the second. Knowing the boolean information that model checks for is not enough. We must also know what to check the continuation for. (This is that is if is equal to the initial value plus 1 at , it is equal to this initial value plus 3 at .)    \(\square \)

The model-checking of a trace with initial environment for a -free specification is a pair where the boolean states whether the specification holds for the trace and specifies the possible continuations of according to , if none.

Example 6

For , we have and (we have ignored the initial empty statement list in to simplify the specification).    \(\square \)

The fact that returns a pair where is to be satisfied by continuations of allows us to perform program model checking by structural induction on the program in Section 8. The formal definition is the following.

figure jq
figure jr

The model checking of a stateful trace in (11) returns a pair specifying whether satisfies the specification (when ) or not (when ). So if in (12) then the trace is a counter example to the specification specifies what a continuation of would have to satisfy for to satisfy (nothing specific when ).

Notice that checks whether the given trace satisfies the regular specification for initial environment . Because only one trace is involved, this check can be done at runtime using a monitoring of the program execution. This is the case Fred Schneider’s security monitors [35] in Ex. 1 (using an equivalent specification by finite automata).

The set of traces model checking returns the subset of traces of satisfying the specification for the initial environment . Since all program executions are involved, the model checking of a program becomes, by Rice theorem, undecidable.

The regular specification is relational in that it may relate the initial and current states (or else may only assert a property of the current states when never refer to the initial environment ). If is an execution trace satisfying the specification then in (15) determines a relationship between the initial environment and the current environment . For example , with expresses that the initial values of variables are denoted would state that there is no constraint on the initial value of variables. The difference with the invariant specifications of is that the order of computations is preserved. can specify in which order program points may be reached, which is impossible with invariantsFootnote 4.

The model checking abstraction (12) which, given an initial environment and an -free regular specification , returns the set of traces satisfying this specification is the lower adjoint of the Galois connection

figure ls

If is a poset, is a complete lattice, then where and , pointwise. This implies that

figure ma

To follow the tradition that model checking returns a boolean answer this abstraction can be composed with the boolean abstraction

figure mb

where .

7 Soundness and completeness of the model checking abstraction

The following Th. 1 shows that the Def. 1 of model checking a program semantics for a regular specification is a sound and complete abstraction of this semantics.

Theorem 1

(Model checking soundness and completeness .

figure mf

At this point we know, by (15) and Th. 1 that a model checker is a sound and complete abstraction of the program component semantics which provides a counter example in case of failure. This allows us to derive a structural model checker in Section 8 by calculational design.

8 Structural model checking

By Def. 1 of the model checking of of a program for a regular specification and initial environment , Th. 1 shows that a model checker is a sound and complete abstraction of the program semantics . This abstraction does not provide a model checking algorithm specification.

The standard model checking algorithms [10] use a transition system (or a Kripke structure variation [26]) for hardware and software modeling and proceeding by induction on computation steps.

In contrast, we proceed by structural induction on programs, which will be shown in Th. 2 to be logically equivalent (but maybe more efficient since fixpoints are computed locally). The structural model checking of the program proceeds by structural induction on the program structure:

figure mt

where the transformer uses the results of model checking of the immediate components and involves a fixpoint computation for iteration statements.

The following Th. 2 shows that the algorithm specification is correct, that is for all program components . So together with Th. 1, the structural model checking is proved sound and complete.

Theorem 2

.

The proof of Th. 2 is by calculational design and proceeds by structural induction on the program component . Assuming for all immediate components of statement, the proof for each program component has the following form.

figure ne

For iteration statements, is a fixpoint, and this proof involves the fixpoint transfer theorem [16, Th. 7.1.0.4 (3)] based on the commutation of the concrete and abstract transformer with the abstraction. The calculational design of the structural model checking is shown below.

figure nh
figure ni

Proof

— In case (19) of a program the calculational design is as follows.

figure nk
figure nl

For the assignment in (23), case (a) checks the prefixes that stops at whereas (b) and (c) checks the maximal traces stopping after the assignment. In each trace checked for the specification the states are checked successively and the continuation specification is returned together with the checked trace, unless the check fails. Checking the assignment in (23) for consists in first checking at and then checking on after the statement. In case (b), is empty so trivially satisfied. Otherwise, in case (c), so is checked after the statement while is the continuation specification.

Proof

— In case (23) of an assignment statement , the calculational design is as follows.

figure oa
figure ob
figure oc
figure od

— The model checking of an iteration statement in (27) checks one more iteration (after checking the previous ones as recorded by X) while the fixpoint (26) repeats this check for all iterations. Case (a) checks the prefixes that stops at loop entry . (b) and (c) check the exit of an iteration when the iteration condition is false, (b) when the specification stops at loop entry before leaving and (c) when the specification goes further. (d), (e) and (f) check one more iteration when the iteration condition is true. In case (d), the continuation after the check of the iterates is empty so trivially satisfied by any continuation of the prefix trace. In case (e), the continuation after the check of the iterates just impose to verify on iteration entry and nothing afterwards. In case (f) the continuation after the check of the iterates requires to verify at the loop entry, at the body entry, and the rest of the specification for the loop body (which returns the possibly empty continuation specification The cases (b) to (f) are mutually exclusive.

9 Notes on implementations and expressivity

Of course further hypotheses and refinements would be necessary to get an effective algorithm as specified by the Def. 3 of structural model checking. A common hypothesis in model checking is that the set of states is finite. Traces may still be infinite so the fixpoint computation (26) may not converge. However, infinite traces on finite states must involve an initial finite prefix followed by a finite cycle (often called a lasso). It follows that the infinite set of prefix traces can be finitely represented by a finite set of maximal finite traces and finite lassos. Regular expressions can be attached to the states as determined by the analysis, and there are finitely many of them in the specification. These finiteness properties can be taken into account to ensure the convergence of the fixpoint computation in (26).

A symbolic representation of the states in finite/lasso stateful traces may be useful as in symbolic execution [24] or using BDDs [6] for boolean encodings of programs. By Kleene theorem [34, Theorem 2.1, p. 87], a convenient representation of regular expressions is by (deterministic) finite automata e.g. [28]. Symbolic automata-based algorithms can be used to implement a data structure for operations over sets of sequences [22].

Of course the hypothesis that the state space is finite and small enough to scale up and limit the combinatorial blow up of the finite state-space is unrealistic [11]. In practice, the set of states is very large, so abstraction and a widening/dual narrowing are necessary. A typical trivial widening is bounded model checking (e.g. widen to all states after n fixpoint iterations) [5]. Those of [30] are more elaborated.

10 Conclusion

We have illustrated the idea that model checking is an abstract interpretation, as first introduced in [17]. This point of view also yields specification-preserving abstract model checking [18] as well as abstraction refinement algorithms [20].

Specifications by temporal logics are not commonly accepted by programmers. For example, in [31], the specifications had to be written by academics. Regular expressions or path expressions [8] or more expressive extensions might turn out to be more familiar. Moreover, for security monitors the false alarms of the static analysis can be checked at runtime [29, 35].

Convergence of model checking requires expressivity restrictions on both the considered models of computation and the considered temporal logics. For some expressive models of computation and temporal logics, state finiteness is not enough to guarantee termination of model checking [17, 21]. Finite enumeration is limited, even with symbolic encodings. Beyond finiteness, scalability is always a problem with model checking and the regular software model checking algorithm is no exception, so abstraction and induction are ultimately required to reason on programs.

Most often, abstract model checking uses homomorphic/partitioning abstractions e.g. [4]. This is because the abstraction of a transition system on concrete states is a transition system on abstract states so model checkers are reusable in the abstract. However, excluding edgy abstractions as in [13], state-based finite abstraction is very restrictive [21] and do not guarantee scalability (e.g. SLAM [3]). Such restrictions on abstractions do not apply to structural model checking so that abstractions more powerful than partitioning can be considered.

As an alternative approach, a regular expression can be automatically extracted by static analysis of the program trace semantics that recognizes all feasible execution paths and usually more [19]. Then model-checking a regular specification becomes a regular language inclusion problem [27].