Keywords

1 Introduction

Property-based testing is a lightweight approach to verification where expected or conjectured program properties are defined in the source programming language. For example, consider the following conjectured propertyFootnote 1 that in Haskell every function with a list of Boolean values as an argument, and a single Boolean value as result, can be expressed as a \({\mathsf{\textit{foldr} }}\) application.

figure a

When this property is tested using our advanced version of Lazy SmallCheck, a small counterexample is found for \({\mathsf{\textit{r} }}\).

figure b

The counterexample is a function that tests for a multi-item list. It is expressed in the style of Haskell’s case-expression syntax. Several new features of Lazy SmallCheck are demonstrated by this example. (1) Two of the quantified variables, \({\mathsf{\textit{r} }}\) and \({\mathsf{\textit{f} }}\), are functional values. (2) An existential quantifier is used in the property definition. (3) The counterexample found for \({\mathsf{\textit{r} }}\) is concise and understandable.

Previous property-based testing libraries struggle with such a property. The QuickCheck [2] library does not support existentials as random testing ‘would rarely give useful information about an existential property: often there is a unique witness and it is most unlikely to be selected at random [14]’. QuickCheck also requires that functional values be wrapped in a modifier [1] for shrinking and showing purposes.

The original Lazy SmallCheck [14] supports neither existentials nor functional values. SmallCheck [14] supports all the necessary features of the property. However, it takes longer to produce a more complicated looking counterexample. This is because SmallCheck enumerates only fully defined test data and shows functions only in part, by systematically enumerating small arguments and corresponding results.

1.1 Contributions

This paper discusses the design, implementationFootnote 2 and use of new features in Lazy SmallCheck. We present several contributions:

  • An algorithm for checking properties that may contain universal and existential quantifiers in a Lazy SmallCheck-style testing library.

  • A method of lazily generating and displaying functional values, enabling the testing of higher-order properties.

  • An evaluation of these additions with respect to functionality and run-time performance.

1.2 Roadmap

Section 2 is a brief reminder of the Lazy SmallCheck approach to property-based testing. Section 3 demonstrates the new features of the Lazy SmallCheck through several examples. Section 4 describes architectural changes that enable these new features. Section 5 presents the formulation of functional values. Section 6 evaluates the new Lazy SmallCheck in comparison to other Haskell property-based testing libraries. Section 7 offers conclusions and suggestions for further work.

2 The Lazy SmallCheck Search Strategy

A property-based testing library uses a strategy to search the test data space for counterexamples to a given property. For example, QuickCheck [2] randomly selects a fixed number of test-data values. SmallCheck [14], on the other hand, exhaustively constructs all possible values of a particular type, bounded by the depth of construction (or some appropriate metric for non-algebraic types).

Lazy SmallCheck instead begins by testing undefined\(\bot \) — as the value and refines it by need. The demands of the test property guide the exploration of the test-data space. When evaluation of a property depends on an undefined component of the test-data, exactly that component is refined. For algebraic datatypes, undefined is refined to all possible constructions, each with undefined arguments. To ensure termination, when Lazy SmallCheck is run, a bound is set on the depth of possible refinements.

Consider the illustrative property prop_ ListSize. It asserts that all lists with \({\mathsf{\textit{Bool} }}\)-typed elements have lengths less than three.

figure c

Clearly this property is false. Lazy SmallCheck finds the following counterexample where each occurrence of _ means any value.

figure d

As Lazy SmallCheck searches for this counterexample, it refines the test values bound to \({\mathsf{\textit{xs} }}\) as shown in Table 1. Notice that the elements of the list \({\mathsf{\textit{xs} }}\) are never refined as their values are never needed by the property. This pruning effect is the key benefit of Lazy SmallCheck over eager SmallCheck.

Table 1. Values of \({\mathsf{\textit{xs} }}\) used by Lazy SmallCheck when testing prop_ListSizes  xs.

3 New Features in Action

The following examples further illustrate the new features in Lazy SmallCheck. The first generates functional values and displays partial counterexamples. The second shows the benefits of generating small, partial functional values. The final example demonstrates existential quantification.

3.1 Left and Right Folds

Let us look for a counterexample of another conjectured property. This property states that \({\mathsf{\textit{foldl1} }}\;{\mathsf{\textit{f} }}\) gives the same result as \({\mathsf{\textit{foldr1} }}\;{\mathsf{\textit{f} }}\) for non-empty list arguments with natural numbers as the element type.

figure e

As in the original Lazy SmallCheck [14], testing this property requires a \({\mathsf{\textit{Serial} }}\) instance for the \({\mathsf{\textit{Peano} }}\) datatype. Additionally, an \({\mathsf{\textit{Argument} }}\) instance must be defined so that Lazy SmallCheck can produce functional values with \({\mathsf{\textit{Peano} }}\) arguments. We have defined a Template Haskell function [15] — \({\mathsf{\textit{deriveArgument} }}\) — that automatically derives a suitable \({\mathsf{\textit{Argument} }}\) instance. Section 5.2 discusses this in more detail.

figure f

Lazy SmallCheck finds a counterexample at depth 3. The function \({\mathsf{\textit{f} }}\) returns \({\mathsf{\textit{Succ} }}\;{\mathsf{\textit{Zero} }}\) if its input is \({\mathsf{\textit{Zero} }}\) and returns \({\mathsf{\textit{Zero} }}\) in all other cases. The list \({\mathsf{\textit{xs} }}\) is of length three where the last element is \({\mathsf{\textit{Zero} }}\).

figure g

3.2 Generating Predicates

Our next example is based on prop_ PredicateStrings from Claessen [1].

figure h

Lazy SmallCheck finds as a counterexample the function \({\mathsf{\textit{p} }}\) that returns \({\mathsf{\textit{True} }}\) when the second character in its argument is \(\mathtt{'a' }\) and \({\mathsf{\textit{False} }}\) when any other character occurs in the second position. The function is undefined for strings of length less than two.

figure i

Why is this the first counterexample found? We might expect a function that distinguishes an initial ‘L’ from an initial ‘S’. As the depth-bound for testing increases, the extent to which the spines of list arguments can be refined increases. But also the range of character values used in refinements increases and the smallest non-empty range contains just ‘a’.

QuickCheck also finds counterexamples for this property but the functions are stricter. They test equality with one of whole strings "Lazy SmallCheck" or "SmallCheck".

3.3 Prefix of a List

This example is taken from Runciman et al. [14]. We assert that a (flawed) definition of \({\mathsf{\textit{isPrefix} }}\) satisfies a soundness specification of the function.

figure j

In Runciman et al. [14], this property could only be checked by SmallCheck as Lazy SmallCheck did not support existential properties. Running it through the new Lazy SmallCheck gives another concise counterexample: if the first argument of \({\mathsf{\textit{isPrefix} }}\) is a multi-item list with first element Zero, and the second argument is [Zero]; then \({\mathsf{\textit{isPrefix} }}\) incorrectly returns True.

figure k

A smallest counterexample with both \({\mathsf{\textit{xs} }}\) and \({\mathsf{\textit{ys} }}\) non-empty suggests an error in the second equation defining \({\mathsf{\textit{isPrefix} }}\). Indeed, a disjunction has been used in place of a conjunction.

4 Implementation of New Lazy SmallCheck

This section describes in detail how new Lazy SmallCheck achieves the process outlined in Sect. 2. We shall return to the prop_ ListSize example discussed in Sect. 2 to illustrate the data-types used in the implementation.

In places, instead of the actual definitions used in the implementation, we give simpler versions that are less efficient but easier to read. These differences will be summarised in Sect. 4.5.

Abstractions We will make extensive use of the \({\mathsf{\textit{Functor} }}\), \({\mathsf{\textit{Applicative} }}\) and \({\mathsf{\textit{Alternative} }}\) type-classes. All are defined in Fig. 1. Functors are containers with an associated \({\mathsf{\textit{fmap} }}\) operation that applies functions to each contained element. Lists, for example, are functors under the \({\mathsf{\textit{map} }}\) function.

Fig. 1.
figure 1

Definition of \({\mathsf{\textit{Functor} }}\), \({\mathsf{\textit{Applicative} }}\) and \({\mathsf{\textit{Alternative} }}\) type-classes.

Applicative functors [12] extend this by viewing containers as contexts from which values may be obtained. Any ordinary value can be wrapped up in a context using \({\mathsf{\textit{pure} }}\). A function-in-context can be applied to a value-in-context using the \((\mathrel {\mathop <\!\!*\!\!\mathop >})\) operator. Returning to the lists example, \({\mathsf{\textit{pure} }}\) places the value into a singleton list and \({\mathsf{\textit{fs} }}\mathrel {\mathop <\!\!*\!\!\mathop >}{\mathsf{\textit{xs} }}\) applies every function in the collection \({\mathsf{\textit{fs} }}\) to every argument in collection \({\mathsf{\textit{xs} }}\) to obtain a collection of results.

Alternative functors are an extension of applicative functors by the addition of an \({\mathsf{\textit{empty} }}\) container and an operation, \((\mathrel {\mathop <\!|\!\mathop >})\), to merge containers. For lists, \({\mathsf{\textit{empty} }}\) is the empty list and \((\mathrel {\mathop <\!|\!\mathop >})\) is list concatenation.

4.1 Partial Values

Refinement exceptions As highlighted in Sect. 2, the test-data space includes partial values that are refined by need during the search for a counterexample. When the value of an undefined is needed, an exception tagged with the location of the undefined is raised and caught by the testing algorithm. The implementation uses GHC’s user-defined exceptions. [11] The definition of Lazy SmallCheck’s refinement exceptions can be found in Fig. 2.

The \({\mathsf{\textit{Location} }}\) information uniquely identifies the component of a partial test-data value that is needed by a property under test. The \({\mathsf{\textit{Path} }}\) in a \({\mathsf{\textit{Location} }}\) gives directions from the root of a binary-tree representation to some specific subtree. The \({\mathsf{\textit{Nesting} }}\) in a \({\mathsf{\textit{Location} }}\) is akin to a de Bruijn [4] level: it identifies the quantifier for the test-data variable that needs refining.

Fig. 2.
figure 2

Definition of \({\mathsf{\textit{Location} }}\) carrying exceptions.

Fig. 3.
figure 3

Definition of the \({\mathsf{\textit{Partial} }}\) values functor.

Partial values functor A functor of \({\mathsf{\textit{Partial} }}\) values is defined in Fig. 3. The only method of accessing the value inside the \({\mathsf{\textit{Partial} }}\) functor is through \({\mathsf{\textit{runPartial} }}\). It forces the result of a computation using partial values and catches any refinement exception that may be raised.

A \({\mathsf{\textit{Show} }}\) instance is defined so that \({\mathsf{\textit{Partial} }}\) values can be printed. The definition is omitted here but it follows the ‘Chasing Bottoms’ [3] technique. This is what allows the display of wildcard patterns in counterexamples.

Running example Consider the third value, \(\bot \mathbin {:}\bot \), tested in Table 1 from Sect. 2. Here is its simplified representation and the results of two small computations using it.

figure l

The undefined arguments of the list-cons are uniquely tagged by locations. The result of applying prop_ ListSize shows that the second argument is needed. Pretty-printing this partial value hides the complexity underneath.

4.2 Test-Value Terms

The representation of a test-value term contains \({\mathsf{\textit{tValue} }}\), the information needed to obtain a partial test-data value, and \({\mathsf{\textit{tRefine} }}\), its possible refinements. The \({\mathsf{\textit{Term} }}\) datatype is defined in Fig. 4.

Fig. 4.
figure 4

Definition of test-value terms and a merging operation.

The \({\mathsf{\textit{Applicative} }}\) instance for terms shows how: (1) the \({\mathsf{\textit{Path} }}\) component of a location is extended through the argument of \({\mathsf{\textit{tValue} }}\) and (2) the \({\mathsf{\textit{tRefine} }}\) uses this information to pass the rest of the path to the relevant subterm.

The \({\mathsf{\textit{mergeTerms} }}\) function demonstrates how a collection of terms can be turned into a single undefined value paired with the ability to obtain the collection when required. This is key to the strategy illustrated in Sect. 2.

Test-value environments After test data is generated but before a property is applied to it, a pretty-printed representation of the partial value is recorded. The benefit of this technique is that we need not record a pretty-printing that could be obtained from the final test-value derived from the term. This will be especially useful for the display of functional values in Sect. 5.

The test-value environments type is shown in Fig. 5. We omit \({\mathsf{\textit{AlignedString} }}\) in this paper but it follows established pretty-printing techniques, such as that used by Hughes [7].

Fig. 5.
figure 5

Definition of test-value environments.

4.3 Test-Value Series Generators

Series functor Properties are tested against a series of depth-bounded test-data terms. The Lazy SmallCheck library defines instances for the test-data \({\mathsf{\textit{Series} }}\) functor that implicitly enforces depth-bounding and the introduction of partial test-data values. These definitions are in Fig. 6.

As with the original Lazy SmallCheck, a depth-cost is only introduced on the right-hand side of binary applications so that each child of a constructor is bounded by the same depth.

Fig. 6.
figure 6

Definition of \({\mathsf{\textit{Series} }}\) generators.

Fig. 7.
figure 7

Definition of the \({\mathsf{\textit{Serial} }}\) type-class.

Running example The following are definitions for depth-bounded values of Booleans, polymorphic lists and Boolean lists.

figure m

Serial class A class of \({\mathsf{\textit{Serial} }}\) types is defined in Fig. 7. Lazy SmallCheck uses \({\mathsf{\textit{Serial} }}\) instances to automatically generate test values for argument variables in properties. Using the generic \({\mathsf{\textit{Series} }}\) operators of Fig. 6, a family of \(cons_n\) combinators can be defined exactly as described by Runciman et al. [14].

Running example again The library defines the series generators for many data-types. The \({\mathsf{\textit{Serial} }}\) instances for \({\mathsf{\textit{Bool} }}\) and lists are as below. Notice that we no longer explicitly define how the arguments of list-cons are instantiated. It is automatically handled by the type system.

figure n

4.4 Properties and Their Refutation

Properties The \({\mathsf{\textit{Property} }}\) data-type in Fig. 8 defines the abstract syntax of a domain-specific language. It includes standard Boolean operators. Crucially, it also provides a representation of universal and existential quantifiers that supports searches for counterexamples and witnesses.

Fig. 8.
figure 8

The underlying representation of the \({\mathsf{\textit{Property} }}\) DSL.

Though not defined here, smart wrappers are provided for all six \({\mathsf{\textit{Property} }}\) constructions. These automatically lift \({\mathsf{\textit{Bool} }}\)-typed expressions to \({\mathsf{\textit{Property} }}\) and instantiate free variables in properties with appropriate series from \({\mathsf{\textit{Serial} }}\) instances.

Refutation of properties The \({\mathsf{\textit{depthCheck} }}\) function takes as arguments an integer depth-bound and a \({\mathsf{\textit{Testable} }}\) property that may contain free variables of types of any \({\mathsf{\textit{Serial} }}\) type. The \({\mathsf{\textit{counterexample} }}\) and \({\mathsf{\textit{refute} }}\) functions given in Fig. 9 search for a failing example.

A key point to observe is that \({\mathsf{\textit{refute} }}\) recurses when it encounters a nested quantification. All refinement requests must therefore be tagged with the \({\mathsf{\textit{Nesting} }}\) level for the associated quantifier. The \({\mathsf{\textit{RefineAt} }}\) information can then be passed onto the relevant \({\mathsf{\textit{tRefine} }}\) function. Those refined terms are then prepended onto the list of terms left to test.

Fig. 9.
figure 9

Definition of the refutation algorithm.

4.5 Differences Between Versions of Lazy SmallCheck

The main differences between the new Lazy SmallCheck and the original Lazy SmallCheck described in [14] are as follows. In the new implementation:

  • Terms are always represented in a type-specific way. Previously they were generated from a generic description.

  • Terms can carry a test-value environment enabling the display of test-data types (such as functions) that cannot be directly pretty-printed.

  • The testing algorithm calls itself recursively, refining information about enclosing quantifiers.

The main differences between real implementation of the new Lazy SmallCheck and the slightly simplified variant described in this paper are as follows. In the real implementation:

  • The \({\mathsf{\textit{Path} }}\) datatype is a difference list to optimise the list-snoc operation.

  • Terms representing total and partial values are distinguished to optimise performance and to allow the use of existing \({\mathsf{\textit{Show} }}\) instances for total terms.

  • Terms representing partial values record the total number of potential refined values they represent up to the depth bound. The refutation algorithm counts the actual number of refinements performed. (This is useful for performance measurements and comparison with other approaches.)

5 Implementing Functional Values

The key to generating functional values is the ability to represent them as tries, also known as prefix trees. New Lazy SmallCheck supports the derivation of appropriate tries for given argument types, and the conversion of tries into functions to be used as test values.

The use of test-value environments allows a trie to be pretty-printed before it is converted into a Haskell function. This removes the need for the kind of modifier used by Claessen [1].

5.1 Trie Representations of Functions

Fig. 10.
figure 10

Definition of the two-level trie data structure.

We define a generic trie datatype in Fig. 10. It is expressed as a two-level, mutually recursive GADT. Level one describes functions that either ignore their argument — \({\mathsf{\textit{Wild} }}\), or perform a case inspection of it — \({\mathsf{\textit{Case} }}\).

Level two represents details of a case inspection. The \({\mathsf{\textit{Valu} }}\) construction occurs when the argument is of unit type and therefore returns the single result. The \({\mathsf{\textit{Sum} }}\) construction represents functions with a tagged union as argument type, performing further inspection on their constituent types. The \({\mathsf{\textit{Prod} }}\) construction represents functions with arguments of a product type, producing a trie that first inspects the left component of the product, then the right to return a value.

A construction \({\mathsf{\textit{Natu} }}\;{\mathsf{\textit{vs} }}\;{\mathsf{\textit{v} }}\) represents a function with a natural number argument. If an argument \({\mathsf{\textit{n} }}\) is less than the length of \({\mathsf{\textit{vs} }}\), the value of \({\mathsf{\textit{vs} }}\mathbin {!!}{\mathsf{\textit{n} }}\) is returned. Otherwise \({\mathsf{\textit{v} }}\) is returned as default. The \({\mathsf{\textit{Cast} }}\) construction is used in all other cases. We shall say more about it in Sect. 5.2. The function \({\mathsf{\textit{applyT} }}\) converts a trie into a Haskell function.

5.2 Custom Data-Types for Functional Value Arguments

The \({\mathsf{\textit{Argument} }}\) class is defined in Fig. 11. Users supply an instance \({\mathsf{\textit{Argument} }}\;{\mathsf{\textit{t} }}\) to enable generated functional test values with an argument of type \({\mathsf{\textit{t} }}\). Each instance defines a base type representation and an isomorphism between the argument type and the base type. This is a variation of the generic trie technique used by Hinze [6]. The \({\mathsf{\textit{Cast} }}\) construction of the trie datatype performs the necessary type conversions using the \({\mathsf{\textit{Argument} }}\) instances.

The \({\mathsf{\textit{BaseCast} }}\) functor is used at recursive points to prevent infinite representations of recursive datatypes. It is a type-level thunk indicating that an arbitrary type can be translated into a \({\mathsf{\textit{Base} }}\) type. For example, Fig. 12 shows the \({\mathsf{\textit{Argument} }}\;{\mathsf{\textit{Peano} }}\) instance. The Template Haskell function \({\mathsf{\textit{deriveArgument} }}\) automatically produces \({\mathsf{\textit{Argument} }}\) instances for any Haskell 98 type.

Fig. 11.
figure 11

Definition of the \({\mathsf{\textit{Argument} }}\) type-class.

Fig. 12.
figure 12

The \({\mathsf{\textit{Argument} }}\) instance for \({\mathsf{\textit{Peano} }}\).

Fig. 13.
figure 13

Definition of \({\mathsf{\textit{Series} }}\) generators for tries and functions.

5.3 Serial Instances of Functional Values

Functional values have been reified through the trie datatype, so we first need to define series of types. The \({\mathsf{\textit{Serial} }}\) instances are defined in Fig. 13. A special type-class \({\mathsf{\textit{SerialL2} }}\) is defined. It represents types that can be represented as trie constructions. The applicative operators with a carret suffix introduce no depth cost, as opposed to those defined in Sect. 4.3. These specialist operators have been carefully placed to give a natural depth metric for functions while keeping the series finite.

Using these definitions, a \({\mathsf{\textit{Serial} }}\) instance for functional values is defined. The default definition of \({\mathsf{\textit{seriesWithEnv} }}\) is overridden to store the pretty-printed form of the trie before it is converted into a Haskell function. This instance definition is omitted here due to lack of space.

Table 2. Comparision of property-based testing library features.

6 Discussion and Related Work

A feature comparison of several Haskell property-based testing libraries can be found in Table 2. The test-space exploration strategy is the main distinction between the QuickCheck library and SmallCheck family of libraries. QuickCheck assumes that test data detecting a failure is likely within some probability distribution. SmallCheck, on the other hand, appeals to the Small Scope hypothesis [8] — programming errors are likely to appear for small test data.

6.1 Runtime Performance

The repository includes performance benchmarks to compare this implementation with the previously published Lazy SmallCheck. Experiments performed using GHC 7.6.1 with -O2 optimisation on a 2GHz quad-core PC with 16GB of RAM show very little difference in execution times between the two encodings.

6.2 Functional Values

The original QuickCheck paper [2] explains how functional test values can be generated through the \({\mathsf{\textit{Arbitrary} }}\) instance of functions with a \({\mathsf{\textit{Coarbitrary} }}\) instance of argument types. At this stage, QuickCheck could not display the failing example without bespoke use of the \({\mathsf{\textit{whenFail} }}\) property combinator.

QuickCheck has since gained the ability not only to display functional counterexamples but also to reduce their complexity through shrinking. Claessen [1] achieves this by transforming functions generated using the existing \({\mathsf{\textit{Coarbitrary} }}\) technique into tries.

Claessen’s formulation of tries slightly differs from ours. Existential types are used in place of type families and there is no provision for non-strict functions. Partiality of functions is explicitly expressed instead of being a result of partially defined tries. Claessen also requires that functions are wrapped in a ‘modifier’ at quantification binding. This \({\mathsf{\textit{Fun} }}\) modifier retains information for showing and shrinking at the expense of a slightly more complex interface presented to users.

In Lazy SmallCheck, on the other hand, we directly generate a trie and then convert it into a Haskell function. A pretty-printed representation of the trie is stored at the time of generation and retrieved for counterexample output.

The SmallCheck representation of functional values uses a \({\mathsf{\textit{coseries} }}\) approach, analogous to QuickCheck’s \({\mathsf{\textit{Coarbitrary} }}\). However, functional values are displayed by systematically enumerating arguments.

6.3 Existential and Nested Quantification

As previously discussed in Sect. 1, it does not make sense to use QuickCheck for existential quantification. The previous design of Lazy SmallCheck made it difficult to conceive of a refutation algorithm that could handle the nested quantification required to make existential properties useful.

The use of the \({\mathsf{\textit{Partial} }}\) values functor in this implementation gives statically typed guarantees that term refinements are performed at the correct quantifier nesting.

6.4 Benefits of Laziness

Runciman et al. [14] discussed the benefits and fragility of exploiting the laziness of the host language to prune the test-data search space. When applied to functional values, we see further benefits. The partiality of a trie representation corresponds directly with the partiality of the function it represents. Whereas Claessen [1] needs to shrink total function to partial functions, the latest Lazy SmallCheck has partial functions as a natural result of its construction.

7 Conclusions and Further Work

This paper has described the extension of Lazy SmallCheck with several new features; (1) quantification over functional values, (2) existential and nested quantification in properties and (3) the display of partial counterexamples.

Properties that quantify over functional values occur often in higher-order functional programming. Similarly, many properties may involve existential quantification and even nesting of quantification within property definitions. The examples in this paper have demonstrated the power of a tool that can find counterexamples for such properties.

This paper takes an extensional view of functional values, characterising them as mappings from input to output. An alternative would be to characterise functions intensionally as lambda abstractions or other defining expressions, perhaps allowing recursion [9, 10]. We would expect the generic machinery for typed functional series to be more complex. Also, when functions are needed as test values, alternative definitions of the same extensional function are not interesting [13].

Parallelisation of the refutation algorithm is a current area of investigation. A prototype implementation shows near-linear speedups, in multicore shared-memory environments, for benchmarks in which no counterexample is found. This benefit is derived from the tree structure of the Lazy SmallCheck test-value search space. However, in some benchmarks where a counterexample is found the overheads of continued searches in other threads can cause slowdowns rather than speedups.