1 Introduction

This special issue of the journal Software Tools for Technology Transfer (STTT) contains revised and extended versions of six papers selected out of 42 papers presented at the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’14) [4], held in Grenoble, France during April 7–11, 2014, as part of the Joint European Conferences on Theory and Practice of Software (ETAPS). The peer-reviewed papers collected in this special issue have been invited by the guest editors amongst the top papers presented at TACAS’14 based on their relevance to STTT.

TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The research areas covered by TACAS include, but are not limited to, formal methods, software and hardware specification and verification, static analysis, dynamic analysis, model checking, theorem proving, decision procedures, real-time, hybrid and stochastic systems, communication protocols, programming languages, and software engineering. TACAS provides a venue where common problems, heuristics, algorithms, data structures, and methodologies in these areas can be discussed and explored.

Due to the increasing complexity of software systems, there is a growing need for automated software synthesis and analysis. In the last decade, active research in the formal methods community brought interesting results and valuable tools. However, there are still challenges to face and hard problems that need to be solved. As the size of our software systems is increasing, the scalability of the automated synthesis and analysis techniques is a highly relevant issue.

The selected papers cover four domains, which we believe form trends within the formal methods community, and which are discussed below and organized as follows. Section 2 discusses parallel algorithms and their application to for example model checking. Section 3 discusses SAT and SMT solving. Section 4 discusses runtime verification. Section 5 discusses hybrid and probabilistic verification. Finally Sect. 6 concludes the paper.

2 Distributed and parallel algorithms

Nowadays, nearly all personal computers have many-core CPUs, the usage of cloud and grid computing is rising, and there are great advances in supercomputer architectures. The performance of the fastest supercomputers available today has reached the PetaFLOPS scale, i.e., they can execute \(10^{15}\) floating point operations per second (FLOPS). The next generation of Exa-scale supercomputers with \(10^{18}\) FLOPS performance is under development.

Distributed and parallel computing techniques make use of such hardware structures to solve computationally intensive problems. Typical areas for massively parallel applications are, e.g., weather forecasting, climate research, and simulations of chemical, biological and physical processes.

However, the efficient usage of these distributed and parallel computer architectures is not at all trivial. The computational effort might be unbalanced between the processes, such that one process might need to wait for a long time for results computed by another process, thereby wasting available computing resources. Last but not least, massive communication (e.g., message broadcasting in a massively parallel application) can be itself a bottleneck for efficiency.

For these reasons, achieving a linear speedup (in the number of used cores) for the computation time is hard to realize. Performance analysis techniques and tools help the developers to identify execution bottlenecks via monitoring the program execution, and computing and visualizing characteristic quantities like, e.g., average waiting times at certain control points. However, there is a strong need for further improvements. We still have problems to exploit the capabilities of Peta-scale supercomputers, and no one knows yet how to achieve this at the forthcoming Exa-scale, where, besides scalability issues, additional problems like increased failure rates must be faced [3].

Besides efficiency, a central problem is the correctness of parallel programs, which has different facets. Deadlocks can happen when threads or processes wait for each other in a cyclic manner, such that none of them can continue its execution. Furthermore, correct parallel programs should preferably (in the general case) yield the same result, independently of the temporal order of process executions. For example, in multi-threading, mutual exclusion must be used in a safe manner to assure atomic computation where needed. To achieve functional correctness, if a problem is decomposed into sub-problems, the result must be carefully synthesized from the sub-results.

To assure such correctness properties, formal methods can be used for the verification of parallel programs. Whereas the theoretical roots for the verification of parallel programs are historically relatively deep [12, 31, 62, 66, 71], current approaches are still far from being scalable at the supercomputing level. New advances in this direction use parallel computing itself for verification, i.e., the verification algorithms themselves are parallel programs. Besides deductive techniques, powerful parallel model checking approaches have become available. However, to achieve scalability, also these techniques must reach an optimal load balance between the parallel running model checking processes.

Early attempts to overcome this problem include, e.g., techniques for partial order reduction and slicing [42, 52], and randomization [19, 75]. Several efforts have been made to parallelize the Spin model checker. An early attempt on distributed model checking in Spin is described in [64]. More recent work can be classified into two categories: multi-core approaches [40, 53, 55] where model checking is distributed on several cores on the same machine, and cloud approaches [54, 56] where model checking is distributed on multiple machines in the cloud. Concerning multi-core approaches, [55] is a general method for safety verification, while [53] and [40] concern an algorithm for partial verification of liveness properties with parallel breadth-first search. Concerning cloud approaches, [54] describes how the use of massive parallelism in a cloud-computing context may deliver near real-time performance. This is furthermore an application of what is referred to a swarm approach, where multiple independent and different instances of the verification problem are launched in parallel.

In this volume we report on three latest developments on this research front. The paper Concurrent Depth-First Search Algorithms based on Tarjan’s Algorithm [68], by Gavin Lowe, an extension of the TACAS’14 conference paper [67], deals with parallelization issues for some important graph-related problems: finding strongly connected components, cycles and lassos in graphs. Tarjan’s algorithm is widely used in its sequential version; however, its efficient parallelization was still an open challenge. The proposed parallelized version may find application in model checking algorithms, for example to check which states are divergent, i.e., which states can lead to an unbounded number of internal steps.

The paper FDR3—A Parallel Refinement Checker for CSP [43], by Thomas Gibson-Robinson, Philip Armstrong, Alexandre Boulgakov, and A. W. Roscoe, extends the TACAS’14 publication [44]. It presents the FDR3 tool, which is a rewrite and of the FDR2 refinement checker with extended functionalities. Amongst its several improvements is a new parallel refinement-checking algorithm, able to achieve a near-linear speedup as a function of number of cores utilized (including clusters of cores), and a new algorithm used to construct the internal representation of CSP processes. FDR3 is furthermore able to efficiently make use of on-disk storage once main memory is exhausted. FDR3 relies on Tarjan’s algorithm, and future work includes pursuing alternative methods of parallelizing divergence checking, including methods based on Gavin Lowe’s work presented in this volume.

The paper Many-Core On-the-Fly Model Checking of Safety Properties Using GPUs [80], by Anton Wijs and Dragan Bošnački, presents another parallelization approach for model checking. A previous version of this paper was published in [79]. Despite advances in model checking, state space explosion is still a hindrance for scalability. Smart concurrent solutions can push forward the boundaries of applicability; however, the resources for concurrent execution are relatively limited on standard home computers. To best exploit these resources, this work makes use of General Purpose Graphics Processors (GPUs) for the computations. This is an extremely challenging path of research, to which this paper makes an important contribution.

3 SAT and SAT-modulo-theories solving

A further active research area is the integration of different techniques to form powerful and efficient tools. In this context, logical encoding of problems and the usage of SAT and SAT-Modulo-Theories (SMT) solving for satisfiability checking are frequently employed.

SAT solving aims at the automated check of propositional logic formulas for satisfiability. The technology development started around 1960. First approaches used enumeration and resolution [29]. The combination of enumeration with propagation led to the well-known DPLL algorithm [28]. A breakthrough regarding efficiency and scalability was achieved by combining enumeration and propagation with resolution to identify reasons to explain why certain (partial) assignments do not satisfy a formula. This resulted in the conflict-directed clause learning approach [69, 82], whose impact is well reflected in the following citation from the zChaff webpage: “We have success stories of using zChaff to solve problems with more than one million variables and 10 million clauses. (Of course, it can’t solve every such problem!)”. After the pioneer solvers GRASP [69] and zChaff [82], a variety of other SAT solvers were developed with watched-literal techniques and smart heuristics, for e.g., clause learning and forgetting, dynamic variable ordering and restarts. Just to mention one of them, MiniSAT [38] is not only highly efficient but also small and, therefore, well suited for understanding and teaching the SAT mechanisms.

The scalability of SAT solvers opened the way to real-world applications. Besides academic applications in different research areas, nowadays also many companies use SAT solving, e.g., to solve huge combinatorial problems or for digital circuit design and verification.

The introduction of a standardized input language was a great achievement and an important milestone in the success of SAT solving. On the application side, it allows users to formalize their problems once and apply a wide range of solvers to them. On the development side, it enabled the collection of large benchmark sets and the start of competitions in 2002. In 2014, the SAT competition had an impressive number of 79 participants with 137 solvers. The SATLive! forum and dedicated conferences and journals further support the community with platforms for exchange.

The SAT developments showed promising results to apply similar technologies to more expressive logics, resulting in SAT-Modulo-Theories (SMT) solving. The idea is to use SAT solvers to get solutions for the Boolean skeleton of quantifier-free first-order logic problems, and use different theory solvers to check the corresponding sets of theory constraints for consistency. Some of the important milestones in this area were the development of decision procedures for combined theories [70, 74], and the extension DPLL(T) [37] of the DPLL approach for SAT with theories.

One of the first theories considered for SMT solving were equalities and uninterpreted functions, bit-vectors and array theory. Later on, also solutions for linear real and integer arithmetic and fragments thereof were implemented. Latest developments address challenging extensions also for non-linear arithmetic theories. These theories are supported by a large and still increasing number SMT solvers, e.g., the tools AProVE [45], CVC [11], HySAT/iSAT [41], MathSAT [26], MiniSmt [81], OpenSMT [24], SMT-RAT [27], VeriT [23], Z3 [30], or Yices [36].

Due to the increased level of expressiveness, SMT finds application in a wide range of domains like, e.g., verification (model checking, static analysis, termination analysis), test case generation, controller synthesis, predicate abstraction, equivalence checking, scheduling, planning, or product design automation and optimization.

Also the SMT community profits from the SMT-LIB input standard and from competitions since 2005. In 2014, 20 solvers participated in 32 logical categories.

Where does the development go? Surely, a still major issue is efficiency and scalability. Whereas for easier theories already large problem instances can be solved, despite encouraging evolution, SMT solving for non-linear real and integer arithmetic is a yet upcoming area. To tackle those problems, we need dedicated SMT solvers for specific problem classes with further novel lemma generation and learning techniques, elegant ideas for the combination of decision procedures, and clever parallelization approaches. A big potential lies in learning from decision procedures and technologies used in symbolic computation [1]. Regarding functionalities, there is also a trend to increase applicability by generating unsatisfiable cores and interpolants, handling quantified formulas, and offering techniques for optimization.

In this volume two contributions are devoted to SAT and SMT solving. SATMC 3.0, a SAT-based bounded model checker for security-critical systems is presented in the paper SATMC: a SAT-based Model Checker for Security Protocols, Business Processes, and Security APIs [7], by Alessandro Armando, Roberto Carbone, and Luca Compagna, as an extension of [6]. It is distinguished by combining techniques originally developed for planning with techniques developed for the analysis of reactive systems. SATMC has been applied in a variety of application domains, including security protocols, security-sensitive business processes, and cryptographic APIs. SATMC supports a powerful specification language, including rewrite rules, Horn clauses, and first-order LTL formulae. It leverages NuSMV to generate a SAT encoding for the LTL formulae and MiniSAT to solve the SAT problems.

The paper Monitoring Modulo Theories [33], by Normann Decker, Martin Leucker and Daniel Thoma, an extended version of the conference paper in [32], considers an SMT-based approach to runtime verification of temporal properties over first-order theories. It lifts monitor synthesis procedures for propositional temporal logics to a temporal logic over structures within a first-order theory, and proposes a first-order monitoring algorithm that combines SMT solving and classical monitoring of propositional temporal properties. The approach is here applied to LTL, and the Z3 SMT solver is used for solving data constraints. However, the approach is generic and can be applied to any suitable temporal logic, and any first-order theory can be chosen for which an SMT solver is available.

4 Runtime verification

The scalability issue often associated with formal methods is due to the desire to verify (analyze) all possible execution paths of the system being analyzed, and potentially for all possible inputs. This problem is in general NP-complete, and in practice becomes infeasible without relaxing on the kind of properties being proven or the confidence in the result. Testing is the practical less perfect alternative to full verification. Here test inputs are generated using a more or less automated strategy, and outputs are verified using more or less sophisticated test oracles (monitors). In industrial practice, test input generation is typically not automated (rather: test cases are manually created), and monitors are typically not very sophisticated, for example just comparing text files with a diff-command.

Runtime verification [39, 50, 65] (RV) is a subfield of computer science focusing on just analyzing systems executions, including collections thereof, either during test (the test oracle problem), or after deployment. The field is not concerned with test case generation, which is one of the main focuses of test research. The purpose of the field is to focus study how much we can get out of one or more execution traces, in other words: just by observing what the system does when executing. The field obviously intersects with testing by contributing to how to write advanced test oracles.

Runtime verification as a field covers various sub-fields. Specification-based monitoring is concerned with checking a program execution against a formal specification of one or more requirements. A program P to be monitored is instrumented to emit a sequence \(\tau \) of observable events, which are fed into a monitor, which as a second input takes a specification \(\psi \) of expected behavior. The trace is then matched against the specification, also formalized as: \(\tau \models \psi \). Instrumentation can, for example, be performed using aspect-oriented programming. Static analysis can be used to minimize the number of instrumentation points, a topic receiving increasing attention by the research community.

Events in practice carry data, as in: open(“file42”), in contrast to propositional events, such as openFile, and it must be possible to refer to these data in specifications. Recent research has focused on efficient and low-impact monitoring of such data carrying events, referred to as parametric monitoring. Such data must be stored and especially searched efficiently as part of the monitor. The 1st Intl. Competition of Software for Runtime Verification (CSRV’14), in particular focusing on parametric monitoring, was held together with the RV conference in 2014 in Toronto, Canada.

Detection of a property violation can be used not only for testing an application, but also during operation in the field, to cause a change of behavior by triggering fault-protection code, which steers the application out of a bad situation. The extreme RV solution is planning and scheduling techniques, which continuously adapt to the current situation. For a survey relating verification and validation to planning and scheduling, see [21].

Over the last 15 years numerous specification-based runtime verification systems have been developed, only a few of which will be mentioned here. Initial specification-based systems could only handle propositional events. These include, for example, Temporal Rover [35], MaC [63], and Java PathExplorer [51]. The first systems to handle parameterized events appeared around 2004, and include [5, 14, 25, 76]. Several parametric monitoring systems have appeared since then. RV systems usually implement specification languages which are based on formalisms such as state machines [13, 25, 46], regular expressions [5, 25], temporal logic [17, 18, 25, 32, 48, 76], variations over the \(\mu \)-calculus [14], grammars [25], and rule-based systems [16, 49]. A few of these logics incorporate time as a built-in concept, typically embedded in temporal logics, as for example in [17]. If no special concept of time is introduced, time observations can be considered as just data (time stamps).

Runtime verification systems are based on different algorithms. Slicing-based algorithms have shown very efficient [5, 13, 25]. These algorithms conceptually slice a trace into projections, a projection for each parameter combination. The efficiency of these algorithms generally comes at the cost of lack of some expressiveness, as pointed out in [13]. Other monitoring systems represent data as constraints. A constraint-based system is the first-order linear temporal logic described in [18]. Some systems based on linear temporal logic apply rewriting of temporal formulas. These include, for example, [14, 15, 48, 76]. Rule-based systems, such as [16, 49] operate with a collection of facts, usually organized in an efficient data structure/network, which is modified by the rules.

Most of the logics mentioned above are so-called external DSLs, small languages with their own grammar and parser. However, also systems have been developed which offer APIs in programming languages (also referred to as internal DSLs), for writing monitors. These include [15, 22, 49].

Specifications can be written by humans, or they can be learned from nominal executions, also referred to as specification mining [57]. A form of runtime verification not requiring specifications is what we will refer to as runtime analysis, where program executions are analyzed with specialized algorithms. Examples include algorithms for detecting concurrency problems such as deadlock potentials [20] and data races [8, 73]. Finally, trace visualization of execution traces supports human comprehension of what the system does [72]. Trace visualization is related to specification mining in that it produces an abstraction the system’s behavior, although only for the eyes.

The focus of future runtime verification research will include continued studies of how to optimize monitoring algorithms, to use less time and less space. Static analysis can be combined with dynamic analysis to minimize the code instrumentation performed, thereby reducing the impact on the monitored program. Another way of looking at this problem is to use static analysis to prove as much as possible of a property, and then use runtime verification to monitor the remaining unproved proof obligations. There will be continued research in expressive and succinct logics, potentially merging well-known logical systems such as temporal logic, regular expressions, and state machines/rule systems. We may eventually see the emergence of such logics in programming languages as part of the design-by-contract paradigm. Specifications are hard to write, and specification mining and visualization may contribute a great deal to ease this task. Each time we run our program, we should learn from it. The ultimate proof of success of this field will be widespread deployment of monitoring against logic-based requirements in industrial applications.

This area of research is represented in this volume by the paper Monitoring Modulo Theories [33], by Normann Decker, Martin Leucker, and Daniel Thoma, already mentioned in Sect. 3.

5 Probabilistic systems

Probabilistic systems are systems with randomized behavior. Some examples are probabilistic algorithms which involve random values drawn from some probability distributions, computer systems with inherent randomization such as quantum computers or approximate computing, or biological systems whose evolution can be modeled probabilistically.

There are different languages to model probabilistic systems. Popular automata-based modeling formalisms for probabilistic systems are discrete- and continuous-time Markov chains, and variants thereof which exhibit non-determinism such as Markov decision processes or probabilistic automata. Probabilistic programs use, additionally to the standard programming constructs, probabilistic branching and probabilistically determined values in assignments, and are well suited for high-level modeling.

To describe the behavior of probabilistic models, probabilistic properties like “the (maximal) probability to reach a set of bad states is at most 0.1” can be formalized in different property specification languages. Probabilistic computation tree logic (PCTL) extends the logic CTL with probabilities and can be used to describe properties of discrete-time models. Continuous stochastic logic (CSL) is a PCTL extension supporting the specification of continuous-time properties. Last but not least, probabilistic linear-time temporal logic (PLTL), a probabilistic extension of LTL can be used to specify probabilistic liveness properties.

Efficient model checking algorithms for these models and logics have been developed, implemented in a variety of software tools, and applied to case studies from various application areas. The crux of probabilistic model checking [9, 10, 59, 60] is to appropriately combine techniques from numerical mathematics and operations research with standard reachability analysis and model checking techniques. In this way, properties can be automatically checked up to a user-defined precision. Markovian models comprising millions of states can be checked rather fast by dedicated tools such as MRMC [58] and PRISM [61]. These tools are currently being extended with counterexample generation facilities to enable the possibility to provide useful diagnostic feedback in case a property is violated [2].

To be able to formalize and analyze systems with uncertain behavior or incomplete specification, also parametric modeling languages and probabilistic model checking techniques for them were investigated, resulting in tools like PARAM [47] and PROPhESY [34].

Despite this intensive and successful developments, there remain several challenging hard and practically relevant problems to be solved. There were some achievements on probabilistic hybrid systems, which have certain probabilistic components either in their discrete or in their continuous behavior. However, these techniques need to be strengthened to reach practical applicability. Also scalability is still an issue. Though model checking tools can handle huge models, novel symbolic approaches and abstraction techniques are needed to analyze probabilistic programs with large variable domains or large-scale parallelism. To mention a last challenge, probabilistic domain-specific languages and formal methods for their analysis would help to model and analyze applications from the area of high-performance computation and approximate computing.

The application of existing techniques and tools to case studies is extremely important, as it brings highly valuable insights to applicability, it highlights bottlenecks, drives research to important practical problems, and eases technology transfer to industry. In this volume, a report on an interesting case study is given in the paper Probabilistic Verification and Synthesis of the Next Generation Airborne Collision Avoidance System [78], by Christian von Essen and Dimitra Giannakopoulou, extending the TACAS’14 publication [77]. ACAS X, the next-generation airborne collision avoidance system considers probabilistic models to represent different types of uncertainty. The authors give a nice example of how the power of existing formal methods and frameworks can be bundled by integrating them into a tool dedicated to a special problem class.

6 Conclusion

Some recent advances in automated analysis have been discussed and related to selected papers from TACAS 2014, included in this volume. Four domains have been identified: the parallelization of algorithms—including algorithms for verifying systems, specifically model checking; SAT and SMT solving with a basis in first-order logic; runtime verification; and finally probabilistic systems. Parallel algorithms, SAT/SMT solving, and runtime verification illustrate different ways of dealing with the scalability problem of formal methods. Parallel algorithms and SAT/SMT solving can be considered successful techniques for solving the traditional verification problem, whereas runtime verification is an example of shifting the problem from verification of full models to analysis of single traces. Probabilistic systems modeling and verification is an example of a new domain, requiring new techniques all together.