Keywords

1 Introduction

The number of decentralized applications (DApps) running on top of blockchain networks is growing very fast. According to  [3], there are now more than 3,000 DApps available on the Ethereum and EOS platforms which generate over 600 thousand transactions per day for a volume of 17 million USD. These DApps interact with the blockchain through around 4,700 smart contracts.

A smart contract is a stateful program stored in the blockchain with which a user (human or computer) can interact. From a legal perspective, a smart contract is an agreement whose execution is automated. As such, its revocation or modification is not always possible and worse than that, what the code of a smart contract does is the law\(\ldots \) no matter what it may end up doing. Unfortunately, like any program, smart contracts may have bugs. Given the potential financial risks, finding these bugs before the origination of the contracts in the blockchain is an important challenge, both from economic and scientific points of view.

Various formal methods have been used to verify smart contracts. In  [5], the authors present a shallow embedding of Solidity within F\(^*\), a programming language aimed at verification. Other similar approaches are based on deductive verification platforms like Why3  [10, 12]. Interactive proof assistants (e.g. Isabelle/HOL or Coq) have also been used for modeling and proving properties about Ethereum and Tezos smart contracts  [1, 4].

A common thread here is the use of general-purpose frameworks based on sequential modeling languages. However, smart contracts can be considered as state machines  [2, 9] whose execution model, according to  [13], is closer to that of a concurrent programming language rather than a sequential one. In this context, the use of model checking techniques becomes highly appropriate  [6, 11].

An important aspect of blockchains is that they are completely open. As a consequence, smart contracts are state machines that need to be conceived for an unknown (and potentially varying) number of users. This parameterized side of blockchains has not previously been taken into account.

In this paper, we propose a first ad-hoc attempt to model smart contracts into the declarative input language of Cubicle, a model checker for parameterized systems based on Satisfiability Modulo Theories (SMT) techniques. In our approach, smart contracts, as well as the transactional model of the blockchain, are encoded as a state machine on which safety properties of interest are encoded and verified. Our contributions are as follows:

  • A two-layer framework for smart contract verification in Cubicle (Sect. 3). The first layer is a model of the blockchain transaction mechanism. The second layer models the smart contract itself.

  • A description of how to express smart contract properties as Cubicle safety properties using both ghost variables and model instrumentation (Sect. 4)

  • A way of interpreting Cubicle error traces as part of the smart contract development cycle (Sect. 5)

2 A Motivating Example

We illustrate our work with the example of an auction contract. The behavior of each client i is given by the automaton \(\mathcal {A}(i)\) in Fig. 1.

Every client starts off in state S1 where they can either bid a certain value v and go to state S2, or close the auction and stay in S1. In S2, a client can either withdraw their bid and end up back in S1, close the auction and stay in S2, or try to win the auction and go to S3.

Each state transition is guarded by certain conditions to ensure that the auction works correctly. To implement these conditions, each automaton needs to share some variables with the other automata and be able to send or receive messages through communication channels. The shared variables are:

  • HBidder: the current highest bidder (winner)

  • HBid: the above’s bid amount

  • Ended: a boolean variable for whether or not an auction is over

  • Owner: the person who owns the contract

  • PR\(_i\): the amount a client i can withdraw,

while channels for message passing are bid, withdraw, end, win, and \(\textsf {refund}_i\) (a refund channel for each client i). These messages are synchronous and can have parameters. Sending a message with parameter c on channel \(\textsf {ch}\) is denoted by \(\textsf {ch}!(c)\), and the reception is written \(\textsf {ch}?(v)\).

Fig. 1.
figure 1

Auction automaton \(\mathcal {A}(i)\)

In order to better explain how the automaton works, we’ll look at states S1 and S2, and the transitions between them.

The condition to trigger the transition from S1 to S2 is given by the formula \(\textsf {bid}?(v) \wedge v > \textsf {HBid} \wedge i \ne \textsf {HBidder} \wedge \lnot \textsf {Ended}\), which should be read as follows:

  • a message on channel bid is received with a value v superior to the current HBid (\(v > \textsf {HBid}\));

  • the client isn’t the current winner (\(i \ne \textsf {HBidder}\));

  • the auction is still open (\(\lnot \textsf {Ended}\)).

If these requirements are fulfilled, the transition is triggered, the client goes to S2 and the state variables are modified in the following manner:

  • \(\textsf {PR}_{\textsf {HBidder}} := \textsf {HBid}\) sets the pending returns for the old winner to their old winning bid;

  • \(\textsf {HBid} := v\) sets the new top bid to v;

  • \(\textsf {HBidder} := i\) sets the new winner to the client i.

Fig. 2.
figure 2

Refund automaton \(\mathcal {R}(i)\)

Going from S2 back to S1 works the same way. The transition can be triggered when the condition \(\textsf {withdraw}? \wedge i\ne \textsf {HBidder}\wedge \textsf {PR}_i>0\) is true, that is when a message on withdraw is received and the client i is not the highest bidder and has some amount to withdraw. When moving to S2, the variable \(\textsf {PR}_i\) is reset and a message \(\textsf {refund}_i!(\textsf {PR}_i)\) is sent. The corresponding reception \(\textsf {refund}_i?(v)\) is part of a refund automaton \(\mathcal {R}(i)\) run by each client, as seen in Fig. 2. The role of this one state automaton is to accept that kind of message and do whatever necessary to accept the refunded value v.

Finally, the global system of the auction and the clients it interacts with can be expressed as a product of the auction automata and the refund automata

$$\begin{aligned} \varPi _i \mathcal {A}(i) \times \varPi _i \mathcal {R}(i) \end{aligned}$$

A client of this contract, aka the automaton, needs to be sure that certain properties hold. These properties can be simple and visible in the transition requirements, or they can be more complex to the point where you can’t prove them through the requirements alone. Consider two properties:

  1. (a)

    “Each new winning bid is superior to the old winning bid”

  2. (b)

    “I do not lose money”

Property (a) is easy to see as it’s the requirement \(v > \textsf {HBid}\) of Fig. 1. Property (b) is less obvious as it requires not only a model of the contract itself, but also of the underlying blockchain semantics.

3 Modeling a Smart Contract with Cubicle

Anything done by a smart contract can be traced back to its blockchain. If HBidder is modified, that means that there was a message on channel bid with a sufficiently large value. Not being able to trace an action back to the blockchain implies a problem. Therefore, modeling a smart contract requires an accompanying model of the blockchain. We do this with the help of Cubicle, briefly introduced in the next subsection.

3.1 Cubicle

Cubicle is an SMT-based model checker for parameterized transition systems. For a more in-depth and thorough explanation, we refer the reader to [7, 8]. In this section, we give a quick overview of the necessary aspects of Cubicle.

Cubicle input programs represent transition systems described by: (1) a set of type, variable and array declarations; (2) a formula for the initial states; and (3) a set of guarded commands (transitions).

Type, Variable and Array Declarations. Cubicle has several built-in data types, among which are integers (int), booleans (bool), and process identifiers (proc). Additionally, the user can define enumerations. For instance, the code

figure a

defines a type location with three constructors (L1, L2, and L3), two global variables W and X of types location and int, respectively, and a proc-indexed array Z. The type proc is a key ingredient here as it is used to parameterize the system: given a process identifier i, the value Z[ i ] represents somehow the local variable Z of i.

Initial States. The content of a system state is fully characterized by the value of its global variables and arrays. The initial states are defined by an init formula given as a universal conjunction of literals. For example, the following declaration

figure b

should be read as: “initially, for all process i, Z[i] is equal to False and W contains L1”. (Note that the content of variable X is unspecified, and can thus contain any value)

Transitions. The execution of a parameterized system is defined by a set of guard/action transitions. It consists of an infinite loop which non-deterministically triggers at each iteration a transition whose guard is true and whose action is to update state variables. Each transition can take one or several process identifiers as arguments. A guard is a conjunction of literals (equations, disequations or inequations) and an action is a set of variable assignments or array updates. For instance, the following transition

figure c

should be read as follows: “if there exists a process i such that Z[i] equals False, then atomically assign W to L2 and X to 1”.

Unsafe States. The safety properties to be verified are expressed in their negated form and characterize unsafe states. They are given by existentially-quantified formulas. For instance, the following unsafe formula

figure d

should be read as follows: “a state is unsafe if there exists a process i such that Z[i] is equal to False and X equals 1”.

Error Traces. All of the above allows Cubicle to verify a model. If it finds a way to reach an unsafe state, an error trace is printed, such as the following

figure e

This lets the user check which sequence of transitions led to the unsafe state. A number preceded by # is a process identifier. This means that t2(#1) stands for process 1 activating that transition. If you have multiple unsafe states declared, the index next to lets you know which one was reached.

3.2 Blockchain Model

To model the blockchain we first need to model the elements that will constitute transactions seen in the blockchain. Consider the transactions as the message passing mechanism from Sect. 2.

figure g

The constructors of type call represent calls to smart contract entry points (message channels). Bid, Withdraw, Finish, Send correspond to the channels bid, withdraw, end, and refund, while None means absence of transactions. The elements of a transaction are defined by four variables:

  • Cmd, the calls to an entry point;

  • Value, the amount of money attached to a transaction;

  • Sender, who calls the contract;

  • Recv; the receiver, used in the case of Withdraw, where the contract calls a client.

Once the elements of a transaction are declared, the next step is to model the transaction mechanism of the blockchain. For that, we define three transitions to simulate transactions to the three smart contract entry points.

figure h

Each transaction has a parameter i which represents the client who called the corresponding entry point. The only requirement indicates that the contract can’t be doing something else simultaneously (Cmd = None). The effects of these transitions are simple: Cmd is set to the corresponding constructor (Bid, Withdraw, or Finish, respectively) and the variable Sender is assigned to i. In call_bid, the variable Value is set to a (positive) random integer corresponding to the amount bid by i.

Once the blockchain has been modeled, we can move on to modeling the contract itself.

3.3 Smart Contract Model

To model the actual contract, we need to model its variables and its functions. These correspond to the state variables and the transitions from Sect. 2, respectively.

figure i

The transitions from state S1 to S2 and back are modeled as Cubicle transitions. These transitions serve as entry points for our contract.

figure j

Transition bid is called by one process, i, who has to be the Sender, but not the current HBidder. The other requirements should be read as follows:

figure k

The effects are simple, HBidder and HBid are set to the new values, PR for the old winner who has now been outbid is set to his old bid value, and Cmd is reset to None to indicate that the contract is no longer occupied.

Similarly, the requirements of transition withdraw are the following:

figure l

The effects of transition withdraw are slightly different since withdraw goes on to send money to whoever called the method. The receiver is now set to i (Recv := i), and the value that will accompany the transaction is set to the amount of money to be returned (Value := PR[ i ]). The pending return PR[ i ] is reset globally for client i (PR[ i ] = 0) and Cmd is set to Send, to indicate that the contract is calling the client’s method. The transition which Send corresponds to can be seen below:

figure m

This transition checks that Send was in fact called (Cmd = Send), as well as the fact that the receiver is the currently active process (Recv = i). It then resets Cmd to None to free the contract.

4 Defining and Verifying Properties

Recall that we want to be sure of certain properties:

  1. (a)

    “Each new winning bid is superior to the old winning bid”

  2. (b)

    “I do not lose money”

Once defined informally, the properties need to be converted into safety properties. This is not always straightforward and might require additional information. It is done via a two-step process consisting of (i) defining extra logical formulas (ghost variables) and (ii) instrumenting the model with these formulas.

4.1 Ghost Variables and Model Instrumentation

Ghost variables, introduced below, do not appear in the contract’s state variables, nor do they impact the Cubicle model outside of property verification.

figure n

The variables Out and In are for property (b). In is an array storing how much each client (aka process) bids, and Out stores how much they get back if/when they call withdraw. Old_HBid tracks the old highest bid for property (a). The code below is the instrumented model. Transition withdraw has been omitted since it is not instrumented.

figure o

The ghost variables appear only in the action parts of the transitions. The bid transition updates HBid to set a new highest bid value. To keep track of what the old value was, Old_HBid is set to HBid’s value. In is updated for the new bidder with their bid value. The transition value is instrumented instead of withdraw, since the most important action, giving the client back their money, happens during value. It uses Out to keep track of the money that’s been returned.

The ghost variables are also part of the initial state declaration.

figure p

That is to say, the auction hasn’t ended, there is no winning bid, the contract isn’t doing anything, and no one has bid and subsequently withdrawn money.

4.2 Defining Properties

Once the code is instrumented, we can introduce the safety properties we want Cubicle to check.

Property (a): New bids are higher

The first property is “Each new winning bid is superior to the old winning bid”. This property can be easily defined by the following unsafe formula which uses only the ghost variables Old_HBid and HBid.

figure q

Checking property (a) with the above formula simply means declaring Old_HBid being superior to HBid as unsafe, but only if the model was correctly instrumented with these variables.

Property (b): Do I lose money?

Defining this property is less obvious. While ghost variables have been introduced to keep track of money exchanges between users and the contract, another problem is the lack of precision of the sentence. When should we check that a user did not lose money? At the end of the auction? If so, when do we consider the auction to really be over?

We will make these issues more concrete in the next section. In particular, we shall explain how we arrived at the following formulation of property (b):

figure r

5 Interpreting Cubicle Error Traces

As stated previously, the tricky property is “I do not lose money”. The logical implication is that if the auction is over, (Ended = True), then your Out isn’t less than your In.

figure s

Except Cubicle prints the following error trace:

figure t

Upon further inspection, it becomes obvious why this state is reached. This is true for every client, even the winner, who technically does lose money, so to speak. We modify our unsafe state to the following by adding that the process cannot be the winner (HBidder \(<>\) i).

figure u

However, Cubicle still says

figure v

as what’s missing is checking whether or not a client withdrew their bid. We incorporate that check below.

figure w

but Cubicle can still reach that state:

figure x

Once the smart contract has completely finished every action associated with a function (i.e. transition), it resets Cmd to None, which we haven’t checked for. We add that to our unsafe state.

figure y

The above is the correct implementation of “I do not lose money”. This time Cubicle replies Safe

But error traces aren’t always the result of incorrectly written unsafe states. The automaton in Fig. 1 is incorrect. Modeling its bid requirements in Cubicle gives the following model and subsequent error trace:

figure z
figure aa

This is due to the requirements in Fig. 1 for bid not matching bid’s effects on the state variables. When a client is outbid, they can’t bid again without first having withdrawn their old bid. This is seen in Fig. 1 when bid sets PR\(_i\) to v with = instead of +=. This means that if an old bid wasn’t withdrawn, its value will be overwritten and the client will lose that amount. The way to fix this is to add \(\textsf {PR}_i = 0\) to bid’s requirements in Fig. 1

6 Related Work

By their nature, smart contracts lend themselves well to formal verification. Numerous approaches have been used to verify smart contracts. In [11], the authors create a three-fold model of Ethereum smart contracts and apply model checking in order to verify them. The authors of [9] introduce a finite-state machine model of smart contracts along with predefined design patterns, allowing developers to conceive smart contracts as finite-state machines, which can then be translated to Solidity. Deductive verification platforms like Why3 are used to verify specific properties of smart contracts. The authors of [10] use Why3 to verify RTE properties like integer overflow and functional properties, like the success of a transaction. Interactive proof assistants are also used in smart contract verification, such as Isabelle/HOL in [4] in order to check Ethereum bytecode. A major trend is the focus on a specific language, most notably Solidity, whereas we propose a more general framework, not tied to any concrete language. Our application of parameterized model-checking allows us to address the parametric aspects of smart contracts and treat them as concurrent programs.

7 Conclusion and Future Work

In this paper we proposed a two-layer framework for smart contract verification with the model checker Cubicle. This method implements a model of the smart contract itself and the blockchain transaction mechanism behind it. Our method introduces a way of verifying various types of functional properties linked to a smart contract as Cubicle safety properties. Since this is done through ghost variables and model instrumentation, it has no impact on the original smart contract itself, meaning it is independent of any particular smart contract language, and is therefore generalizable and usable for multiple smart contract languages. We also describe a way of interpreting potential error traces generated by Cubicle, and how they can aid in the development of a smart contract. An immediate line of future work is to automate this stepwise process. We need to define an abstract high-level language to express the properties to be checked by Cubicle. From this language, the ghost variables will be automatically generated to instrument the Cubicle code. Furthermore, we would also like to consider automatic translation of Solidity or Michelson code to Cubicle.