Keywords

1 Introduction

Software testing based on formal specifications has a slightly troubled history. In the seventies, most actors of both fields considered that they had nothing to bring to each other. Even worse, some influential scientists from both sides emitted mutual doubts on the pertinence of these fields: from the side of the software-testing research community, one can cite De Millo et al. [13]; and from the side on the formal-approaches-to-software-engineering community, one can cite the famous Dijkstra curse against testing [15], which definitely deserves to be quoted:

Program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence. The only effective way to raise the confidence level of a program significantly is to give a convincing proof of its correctness

Another Dijkstra’s quotation, more directly relevant to the topic of this paper, is:

A common approach to get a program correct is . . . by subjecting the program to numerous test cases. From the failures around us we can derive ample evidence that this approach is inadequate. To remedy the situation it has been suggested that what we really need are “automatic test case generators” by which the pieces of program to be validated can be exercised still more extensively. But will this really help? I don’t think so. (EWD303, year unknownFootnote 1)

Fifty years later, the idea of software testing methods based on formal specifications is accepted as respectable, among the numerous existing approaches to software testing [12], and some powerful tools exist. Most formal specification methods come with some notion of test derivation, submission, verdict and there is a number of success stories. Moreover, the complementarity of tests and proofs in software validation and verification is the subject of active research activities, attested by numerous publications and the fact that an international conference “Tests and Proofs” has taken place annually since 2007. There is an excellent survey by Hierons et al., titled “Using Formal Specifications to Support Testing”, published in 2009 in ACM Computing Surveys [22].

Therefore, it is of interest to look back at this evolution towards convergence of two research fields that were originally so distant.

2 Software Engineering, Formal Methods, and Testing in the Seventies and the Beginning of the Eighties

A possible subtitle of this section could be: Why formal? Why testing?.

In the seventies, formal approaches to software development, validation, and verification were not considered as credible by the majority of software engineers and software testers. Symmetrically as mentioned above, many supporters of formal approaches considered that software testing was an ineffective activity doomed to disappearance after the generalisation of formal methods.

My interest in testing based on formal specifications arose during my Ph.D. whose topic was compiler generation based on formal semantics of the source and target languages. I needed examples of source programs to test the generated compilers and my attention was drawn to a work by Houssais on the verification of an Algol 68 implementation [24] and on a little-known subsequent research report of the Université de Rennes 1, titled “Un générateur de tests commandé par les grammaires”. Since testing compilers was not the main focus of my Ph.D., I didn’t spend much time on fully exploiting Houssais ideas. However, I used them as guidelines, which turned out to be useful.

I had noticed some similarities between programming language definitions and algebraic data types. In both cases, there is a syntax (for algebraic data types, the signature), and some constraints (contextual rules for programming languages, axioms for algebraic data types). This led me to propose as Ph.D. subject to Luc Bougé the generation of test from algebraic specifications. The thesis was defended in 1982 [3], followed by four other ones at the end of the eighties. It was the origin of successful developments, including industrial applications, that are summarised in [20]. At that time, it was not easy to publish on this subject. Dijkstra’s curse was very present in the mind of researchers in formal methods. Our first publication in an international conference was in 1985 [4]. However, these ideas turned out to be a rather good selling point of formal methods to industry: in October 1981, I was hired to create a research group on the use of such methods for software engineering in the research laboratory of the Compagnie Générale d’Electricité.Footnote 2

At the same time, i.e. 1981–1983, several research works were led on exactly the same topic, namely testing based on algebraic abstract data types: Gannon, McMullin, and Hamlet in the U.S. [19], and Jalote in India [26]. A lot of technical questions were still open, but the fundamental ideas were well stated. Among the open issues at that time, there were:

  • the determination of equality between two test results, the difficulty coming from the difference of levels of abstraction between the specification and the implementation under testFootnote 3.

  • the occurrence of partial operations in the specification, with two causes of partiality: either the result is mathematically undefined, or it is not specified, the specification allowing some freedom to the implementation.

  • the possibility of non-determinism in the specification or in the implementation.

Note that these issues are not due to the use of formal specifications. They arise in any testing method that is specification-based. For some hints on the way they have been formally treated later on, one can see [20] and [22].

Meanwhile, another research community had been very active in the domain with very strong motivations: the researchers in telecommunication protocols.

3 The Area of Telecommunication Protocols at the Beginning of the Eighties

Communicating systems use well-defined formats for exchanging messages. Each message has an exact role and corresponds to a range of possible answers. The specification of the protocol is independent of how it may be implemented and exploited by the telecommunication operators. Besides, testing can be based on the specifications only (i.e. it’s black-box testing), as manufacturers generally don’t disclose implementation details.

Since communication protocols must be agreed upon by various entities such as international agencies, operators, or developers, technical standards, formal or not, have been developed very early for their definitions (see for instance [9]) as well as for conformance testing methodologies [25]. Certification is performed via standard abstract test sets.

In such a context, research projects and publications on specification-based testing of such protocols had flourished (see [33] among the early ones). Note that this paper was published in 1984, at the 2nd IFIP International Workshop on Protocol Specification, Verification and Testing (IWPTS), which has occurred yearly since, modulo a few changes of namesFootnote 4. Since 2007, under the name TESTCOM this conference merged with the FATES workshopFootnote 5 and has gathered together researchers working on specification-based testing, independently of the kind of considered software: communication protocols or others.

The characteristics of the approaches developed for communication protocols was: the formal specification languages, such as SDL, ESTELLE, LOTOS, were standardised, with some semantics based on Finite State Machines or Labelled Transition Systems that are well-suited for the description of non-terminating processes; the notions of conformance, abstract test specifications, certification, were standardised as well.

In contrast, the approaches developed in the community of formal software engineering, or those that were on the verge to be developed, were based on specification languages designed by research groups, with notations inspired from logic, and semantics based on axioms satisfaction (algebraic data types) or predicate transformers (Z, VDM).

4 What Happened in the Nineties and Later

The end of the eighties and the beginning of the nineties were a turning point for the topic of testing based on formal specifications. Within five years a lot of well-founded testing methods were established and validated for the main formal specification methods. It’s impossible to cite all of them. Let us just mention three sets of worksFootnote 6.

Tests derivation from formal specifications with semantics based on labelled transition systems, such as LOTOS or SDL, started to be abundantly studied at the end of the eighties [5, 32, 38]. Later, in 1996, Tretmans introduced the ioco conformance relation [36], which turned out to be quite pertinent for testing those systems that are input-enabled, i.e. they are always ready to accept some input. It has been at the origin of several tools such as TGV [27], TorX [10] and more recently TorXakis [37].

For the VDM specification method, Dick and Faivre presented in 1993 a testing method [14] at the Formal Method Europe Conference (FME’93). It must be observed that this work was performed in an industrial context, namely Bull Information Systems in the UK and its Corporate Research Centre in France, and that it has strongly influenced the subsequent researches on testing based on model-based formal specifications. At the same time, Stocks and Carrington proposed Test Templates, a test method based on Z specifications [35], which was published at the International Conference on Software Engineering (ICSE 15). Both software formalists and software engineers recognised the interest of testing methods based on model-based formal specifications.

In my group, we pursued the work on testing based on algebraic specification. In 1991, with Bernot and Marre we published a case study on the test of binary search trees [2], and in 1993 Dauchy and Marre successfully applied the method and the tool to critical parts of the software of an automatic subway [11]. Since then, our approach has been generalised to other formal methods: full LOTOS [21], Lustre and Esterel [30] with applications to nuclear control systems certification, CSP [7] and Circus [8].

One observes that since the mid-nineties, test derivation from formal specifications has become a popular research topic, well accepted both in conferences and journals devoted to formal methods and in conferences and journals devoted to software engineering and testing. New formal specification methods are almost systematically enriched by some test derivation methods: for instance, it was the case for the B method [29, 34], and ASMl, the Abstract State Machine Language [1] and its companion testing environment SPEC EXPLORER [39] developed at Microsoft Research.

Moreover, the cultural gap between researchers from the telecommunication world and from the formal software engineering community is much less pronounced. Most theories, methods, and tools are of common use or unified, even if the telecommunication universe remains much more dependent on standards.

5 Conclusion

Despite Dijkstra’s curse, the convergence of formal methods and software testing has happened. Definitely, one of the antidotes to Dijksta’s curse has been an invited conference and paper by Tony Hoare at the FME Symposium in 1996 titled “How Did Software Get So Reliable Without Proof?” [23]. There, he underlined the effectiveness of software engineering techniques, among which testing, and the fact that formal methods

provide a conceptual framework and basic understanding to promote the best of current practice, and point directions for future improvement.

As mentioned above, nowadays, most formal specification methods come with some notion of test derivation, submission, verdict. There is a number of significant success stories. A majority of them are in link with academia, but a number of them took place in industry. Testing based on formal specification is now one of the recognised resources in the arsenal of software testing methods.

At present, theorem provers and model checkers make use of tests [16, 31], and testing tools make use of theorem provers [6, 17] and model checkers [18]. The Test and Proof (TAP) series of conferences and the Verified Software: Theories, Tools, Experiments (VSTTE) series of workshops are well established. They attract both communities, provide a forum for meetings and talks, ensuring the development of research activities integrating formal methods and testing.