Keywords

1 Introduction

A smart contract is a computer program written with an intent of running it on a blockchain [15]. A blockchain, as a substrate of running a smart contract, offers certain guarantees when events occur in the running of that smart contract; for example, when the smart contract transmits digital currency to another smart contract and the latter accepts it, both the sender and the receiver can be assured that these events have indeed occurred, and that this fact is recorded immutably.

We focus on smart contracts written for the Ethereum blockchain [11]. Ethereum provides a so-called Ethereum Virtual Machine (EVM) within which a smart contract runs. The language an EVM understands and executes is called EVM code [33]. EVM code is a somewhat low-level language, with instructions similar to those we find in an assembly language. From the standpoint of expressive power, EVM has been called quasi-Turing-complete; the rationale from Wood [33] is: “the quasi qualification comes from the fact that the computation is intrinsically bounded through a parameter, gas, which limits the total amount of computation done”. (In Sect. 4, we address the computational complexity of verification problems of interest to us more clearly). Typically, a programmer writes a smart contract in a higher-level language such as Solidity [10], which is then compiled to EVM code for deployment in Ethereum.

Fig. 1.
figure 1

The Solidity code for Lotto.sol from the curated dataset of Durieux et al. [8] is to the left. To the right is a portion of its complied EVM code as bytecode to the top, and the bytecode written as human-recognizable instructions and their arguments below it. A number in square brackets, e.g., “[48] ”, is the byte number within the bytecode, starting at 0, of the particular instruction + arguments.

Figure 1 shows a smart contract written in Solidity and a portion of its encoding as EVM code. As the Solidity code suggests, there is the notion of a contract, within which one can specify functions. The address data type identifies a publicly visible contract that resides on the Ethereum blockchain. A function in a contract may be invoked from another contract, sometimes with arguments. A function may invoke another contract, for example, via a send() call as shown in the example code. EVM code, as we show to the right of the figure, is a string of bytes. Each byte is either an instruction, e.g., PUSH4, or an argument to an instruction, e.g., 0x0072.

Our Work. Verification as to whether a smart contract contains security vulnerabilities is a well-recognized problem in both research and practice. There are a number of software tools that address this; we present some of the most closely related to our work in Sect. 2. Our work is motivated by a recent empirical assessment of several such tools [8], which points out that they suffer high numbers of false positives and false negatives when they check for vulnerabilities from common occurring classes [23] in real-world smart contracts. (A false positive is a tool reporting a vulnerability where there is none. A false negative is a vulnerability that goes undetected by the tool). We think that the underlying reason is that existing tools “try to do everything, none of them well.” Said differently, the poor performance of existing tools leads us to ask: if we were to focus on even one class of vulnerabilities only, can there exist a tool that for real-world smart contracts, is highly effective? Lest one mistakenly think that checking for only one class is somehow “easy”, as we discuss in Sect. 3, even a basic verification problem, such as whether a contract is guaranteed to halt when run on some input, is intractable for Ethereum smart contracts.

In this work, we answer the above question in the affirmative. Our proof is by construction—we discuss our reduction to model-checking, the use of Linear Temporal Logic (LTL) to meaningfully encode the property to be checked for, and the use of the off-the-shelf model checker nuXmv [3]. Our choice of the class of vulnerabilities is unchecked low-level calls. An unchecked low-level call is an invocation to another contract whose return value is not checked. As we establish in Sect. 3, this problem is undecidable in general, and \({\textbf {PSPACE}} \text {-complete}\) under the finitizing assumptions we adopt. Our empirical results on almost 200 contracts from the benchmark of Durieux et al. [8] suggest that our approach is highly effective—we see zero false positives and negatives.

The remainder of this paper is organized as follows. In the next section, we discuss related work. In Sect. 3 we address the computational complexity of the problem. In Sect. 4, we present our reduction to model-checking and encoding in SMV. In Sect. 5, we discuss our empirical validation. We conclude with Sect. 6 in which we discuss also future work.

2 Related Work

Our work addresses checking a smart contract for security vulnerabilities that belong to commonly occurring classes of such vulnerabilities. As such, our work is related to work that proposes classifications of security vulnerabilities in smart contracts, and tools and techniques for detecting vulnerabilities. From the standpoint of tools, we focus on one class of vulnerabilities only, but are highly effective for that class. That is the key distinction between our work and all prior work on tools. We leverage a particular classification [23], but otherwise make no contributions to that aspect.

From the standpoint of classifications, several exist—Rameder [27] and Tolmach et al. [31] provide comprehensive surveys. We adopt the classification and taxonomy of DASP [23], which in turn is adopted by the empirical work of Durieux et al. [8]. The work of Durieux et al. [8] proposes two datasets of smart contracts which they have made available publicly [9], and using which they have assessed several existing tools and demonstrated that those tools suffer from too many false negatives. One of these is a curated set, with several smart contracts in each of the categories of DASP. Their other dataset is a “wild” dataset of several thousand smart contracts. We rely on their datasets for our empirical assessment. There have been other datasets that have been proposed, such as that of Ghaleb and Pattabiraman [13].

As the surveys of Rameder [27] and Tolmach et al. [31] express, there are several tools which analyze smart contracts for security vulnerabilities; a comprehensive discussion is beyond the scope of this work. In the remainder of this section, we survey several of the prior tools. We emphasize that none, to our knowledge, claims to suffer zero false positives and negatives for even one class of security vulnerabilities. That is the main difference between our work and prior approaches.

A piece of work that is closest to ours is that of Nehai et al. [24]. That work proposes applying model checking, and specifically the use of nuSmv, to “capture the behaviour of the Ethereum blockchain, the smart contracts themselves and the execution framework”. Many of the other tools are based on verification techniques that check for security assurance and correctness of smart contracts. For example, Bhargavan et al. [2] create a framework that compiles contracts written in Solidity to F* and checks for, e.g., safety with respect to runtime errors; and decompiles EVM bytecode into F* code to check for low-level properties such as bounds on the amount of gas required for a transaction. Other tools, such as the one created by Amani et al. [1] extend formalisations of the Ethereum Virtual Machine and use theorem provers such as Isabelle or Coq to check for reachability properties in smart contracts, such as termination. There are symbolic execution frameworks, such as Manticore [21], which implement state exploration, but do not target smart contract vulnerabilities that can be exploited by attacks.

Oyente [19], however, does so. This symbolic execution tool uses the Z3 solver to determine the feasibility of execution traces of smart contracts (input as bytecode) to detect given vulnerabilities such as transaction-order dependence. Kalra et al. [16] argue that Oyente is neither sound nor complete and propose a symbolic model checker, Zeus, for Solidity-based smart contracts which categorizes them into incorrect and unfair groups, and further detects vulnerabilities such as re-entrancy in the former and ‘incorrect logic’ in the latter. Zeus inserts assertions based on vulnerabilities into smart contract code and, after translation to LLVM bitcode, uses the SeaHorn verifier to determine assertion violations. Zeus’ soundness claims are refuted by the creators of eThor [30]: a sound and static analyzer for bytecode. This tool creates semantic abstractions based off of the blockchain environment, gas modelling, the memory model, and the callstack and scans for reachability properties to search for re-entrancy bugs.

Several other symbolic execution tools exist. MadMax [14] uses another tool, Vandal [4] to translate EVM bytecode to an intermediate representation and then detects out-of-gas exceptions. Maian [25] (based on Oyente) detects three types of vulnerable contracts: suicidal (can be killed by anyone), prodigal (can send Ether to anyone), and greedy (cannot have Ether extracted from). Mythril [22] uses a symbolic interpreter for EVM bytecode known as LASER to find abstract program states and reason about their reachability (using a Z3 solver) given certain conditions to determine vulnerabilities (such as integer overflow); then uses concolic testing to deign whether the vulnerabilities are exploitable. Securify [32] is a security analyzer for smart contracts that takes as input the bytecode of a contract and checks for compliance and violation security patterns such as locked Ether.

teEther [18] is an automated framework which scans bytecode for instructions (such as CALL) that can be used to extract Ether, and then creates exploits. Another tool (which focuses on instructions related to inter-contract reasoning, memory, and hash functions) is the bounded model checker EthBMC [12]. The creators of the tool test it against others upon toy smart contract examples before conducting a large-scale analysis and comparing their results with those received from their experiments using teEther and Maian. That work has since been followed-up by those of Perez et al. [26] Zhang et al. [35], Rodler et al. [28], Chinen et al. [7], Cecchetti et al. [5] and Chen et al. [6]. All these pieces of work except those of Rodler et al. [28] and Cecchetti et al. [5] address detection of exploits and vulnerabilities; those two pieces of work propose defenses, the former via patching and the latter via a new type system.

Finally, specifically as it pertains to unchecked low-level calls, there is work on simply replacing such calls entirely in the source-code [34]. However, that work has its own set of limitations, such as the inability to identify all low-level calls, and whether the replacement indeed results in a contract that is equivalent to the original.

3 Computational Complexity

Our approach, which we discuss in the next section, is based on model-checking. In this section, we articulate the underlying foundations in computational complexity. We have two sets of results: that in general, checking for unchecked low-level calls in Ethereum smart contracts is undecidable, and that under finitizing assumptions we make, it is \({\textbf {PSPACE}} \text {-complete}\). To establish these results, we appeal to the halting problem for smart contracts. We omit full proofs on account of lack of space, and provide sketches only.

As we mention in Sect. 1, EVM has been called quasi-Turing-complete, with the quasi- being attributed to the fact that a sequence of computations is bounded by a parameter called gas, and every computation consumes some gas. Therefore, assuming that the value for gas is bounded by a constant, EVM is not Turing-complete. However, in our observation, gas is not the only limiting factor in this regard. The only storage offered in EVM are: (i) a program counter, which is 256 bits, (ii) a stack, which has a maximum size of 1024, each entry 256 bits, (iii) memory, addressed by 256 bits, each entry 8 bits, and, (iv) storage, addressed by 256 bits, each entry 256 bits. As all of these are bounded by constants, we do not have the unboundedness required to encode an arbitrary Turing machine. However, as the number \(2^{256}\) is large, certainly from the standpoint of verification in practice, we may assume that any value whose upper-bound is \(2^{256}\) is unbounded.

Theorem 1

If we assume that any value bounded by \(2^{256}\) only is unbounded, and gas is unbounded, then whether a run of a contract encoded in EVM halts on some input is undecidable.

Our proof is by construction—we have an encoding of a Turing machine in Solidity, which then compiles to EVM. Our Solidity code uses the uint data type which is 256 bits for anything that may be unbounded, for example, the number of states, the size of the alphabet and the tape.

Corollary 1

Detecting unchecked low-level calls in EVM is undecidable.

We can reduce the halting problem that underlies the proof for Theorem 1 to the complement of the problem of finding unchecked low-level calls, and then appeal to the fact that a problem is undecidable if and only if its complement is undecidable. We wrap every instruction that causes a smart contract in EVM to stop running with a CALL instruction and a check of its return value. Thus, the contract halts if and only if the return value from the CALL is checked. There are only a few instructions in EVM that cause a contract to stop running, e.g., STOP and REVERT.

Theorem 2

Suppose the number of memory and storage locations allocated and accessed by a smart contract C is at worst polynomial in the size of C and in \(\log {g}\) where g is the value of the gas parameter. Then the problem of determining whether C terminates on some input is \({\textbf {PSPACE}} \text {-complete} \).

The proof for \({\textbf {PSPACE}} \text {-hardness}\) follows from an encoding of a Linear-Bounded Automaton (LBA) using our construction for a Turing machine to which we refer in the Proof for Theorem 1, and the fact that the halting problem for LBAs is \({\textbf {PSPACE}} \text {-hard}\). To prove that the problem is in \({\textbf {PSPACE}}\), we can construct a non-deterministic algorithm, denote it \(\mathcal {A}\), to decide the problem such that \(\mathcal {A}\) allocates at worst polynomial space only. We then appeal to Savitch’s theorem that \({\textbf {NPSPACE}} = {\textbf {PSPACE}} \) [29]. \(\mathcal {A}\) simply allocates the maximum space that may be needed and runs the smart contract on the input. For any uninitialized or unknown value, e.g., the return from a CALL to an external contract, \(\mathcal {A}\) chooses a return value non-deterministically. We know, from our assumptions, that the space \(\mathcal {A}\) must allocate is at worst polynomial in the size of the input.

We need the “\(\log \)” qualification for the input gas g for the reason that g may be encoded, for example, in binary, in which case, under the assumption that each instruction consumes constant gas, the contract C may run, on some input, for time \(\varTheta (g)\), and therefore may allocate space as much as \(\varTheta (g)\), where \(\varTheta (\cdot )\) represents a tight asymptotic bound.

Corollary 2

Determining whether an EVM contract has an unchecked low-level call under the finitizing assumptions of Theorem 2 is \({\textbf {PSPACE}} \text {-complete}\).

Similar to the proof for Corollary 1, we appeal to the fact that a problem is \({\textbf {PSPACE}} \text {-complete}\) if and only if its complement is \({\textbf {PSPACE}} \text {-complete}\).

4 Reduction

From Corollary 2 in the previous section, we have a sufficient condition for our problem to be \({\textbf {PSPACE}} \text {-complete}\), and therefore, reduction to model checking that is complete for \({\textbf {PSPACE}}\) is an appropriate approach. This is exactly what we do. Our implementation encodes the model checking problem in SMV [3], but more abstractly, the output of our reduction is a Kripke structure [17]. To us, a state is uniquely determined by the value of several variables that we associate with a state. We then define state-transitions between those states, some of which are non-deterministic. The variables are for the basic components of the state of an EVM: (1) The stack, memory and storage of the EVM, (2) the current stack-head, (3) the next instruction to be executed, and, (4) the gas remaining.

Program Counter. To keep track of the instructions, we have a variable which we call operationName, which is an enumeration of every instruction that appears in the EVM code that is input, annotated with its byte location. For example, suppose we have the EVM code 6005600401. This corresponds to:

figure a

Then, our reduction would introduce:

figure b

The begin and end are keywords we introduce to delimit the sequence of instructions. The state-transitions of the instructions would be the following (in SMV syntax, the value of the variable in the next state is shown after the colon, “: ”):

figure c

Unless we have a JUMP or JUMPI instruction, our program counter increments sequentially, i.e., by the number of bytes the previous instruction + operands consume. We maintain also an operationArray[ ], which identifies an operand, if any, in the EVM code for any operation. In the above example of two pushes and then an add, the argument to each of the two pushes is in the EVM code, and the argument to the add is not. Consequently, we would initialize and never change our operationArray[ ] as follows (0ud x ... is the SMV syntax for an unsigned x-bit value):

figure d

Stack, Memory and Storage. For each of the stack, memory and storage, we maintain arrays. For the stack we maintain also a stack_head. We also exercise a design choice on the size of each entry in the stack. While we can certainly set those to 256-bit, for most smart contracts we observe in the benchmarks, this is too large. We can estimate the size we need via a simple static analysis for the PUSH instructions, and other instructions such as ADD and MUL that increase the size of a stack-entry. In EVM code, the push instructions are PUSH1,...,PUSH32, where PUSH x means x bytes are pushed. For memory and storage, the situation is different. There are two memory-store instructions, MSTORE, which stores 256 bits, and MSTORE8, which stores 1 byte—we find only the former in the datasets we have adopted [9]. There is only one storage-store instruction, SSTORE, which stores 256 bits. However, the value stored in memory and storage comes from the stack. Therefore, we again rely on our estimate of the maximum size stored on the stack for the sizes to be stored in memory and storage. Also, with the stack, in EVM, the maximum number of entries is 1024. However, this is again an upper-bound that is often loose. A count of the number of PUSH instructions gives us a tighter upper-bound on the number of possible entries in the stack. We may need to account for the possibility that the same PUSH instruction may be executed more than once. This depends on the specific property we are checking for. For unchecked low-level calls, we know that we need to account for at most a constant number of executions of a PUSH instruction.

figure e

The memory_offsets and storage_keys are needed to identify to exactly which memory and storage locations a corresponding instruction refers. Consider, as an example, the following EVM code.

figure f

The MSTORE instruction takes two arguments, both off the stack: an offset or location within memory, and the value to be stored. In the above example, the offset is 0x40 and the value to be stored is 0x80. We would store the offset in some index, call it i, of memory_offsets[ ], and ensure that the value in memory[i] is the value stored at that offset. We discuss below under “Instructions that need more than one transition” as to the manner in which we handle the MSTORE, SSTORE and their corresponding load instructions via state-transitions in our model.

The stack[ ] array changes based on (i) the particular instruction, and (ii) the current value of the stack_head. Given the quirks of the SMV syntax, we need to enumerate every possible next-state value of each entry of the stack. Thus, if our stack[ ] was specified to be of 10 entries, indexed 0...9 as we show above, then, we would have a next(stack[i]) case statement for every \({\texttt {i}} = 0, \ldots , 9\). For example:

figure g

To explain the above, we first clarify that our stack grows downwards, i.e., as we push more items onto the stack, the value of stack_head increases. Also, the value of stack_head is the next entry in the stack; thus, for example, if stack_head is 6, then the top of at the stack is at 5, and we initialize stack_head to 0. Thus, in the above snippet, the only way the value in stack[6] can change in a state-transition is if the current value of stack_head is 6. If the instruction is \({\texttt {PUSH1}} \), then we get the value to be stored in stack[6] from our EVM code, which in turn is in our operationArray[ ] (see above for a discussion of this variable). If the instruction is DUP2, we need to make a copy of the value from one below the current head of the stack, i.e., stack[4], and store that in stack[6]DUP x pushes a copy of the value that is at depth x in the stack, where \(x = 1\) refers to the top of the stack, \(x=2\) one below the top and so on.

The stack_head also changes based on the instruction and the current value of stack_head. As we say above, the stack grows downwards, i.e., we increment the stack_head as items are added to the stack and decrement it as items are popped. The only other detail is that in our implementation, our stack is circular, and every change to the stack_head is performed modulo its maximum size. This has enabled us to quickly experiment with small stack-sizes albeit while risking correctness, specifically, false negatives.

figure h

In the above example, the maximum size of the stack has been set to 10, with the items indexed \(0,\ldots , 9\). The CALL instruction pops 7 items off the stack and pushes 1, for a net of 6 items popped. Consequently, we update the stack head to \({\texttt {stack\_head}} + 4 \mod 10\). The SWAP x instruction swaps item at depth \(x+1\) on the stack with the item at the top; it does not change stack_head. The POP instruction decrements stack_head by 1.

Instructions that Need One Transition Only. We observe that for most instructions in EVM, we require one state-transition in our model only. The simplest examples are those involving basic arithmetic, such as ADD and MUL which are supported directly in SMV. For example:

figure i

The above shows that if our instruction is MUL, then our stack decreases by a net of 1—MUL pops two items off the stack, multiplies them and pushes the result. Thus, if our current stack_head is 4, we multiply stack_head[2] and stack_head[3] and store the result in stack[2].

There are also numerous other examples of instructions that require one state-transition only in our model. For example, the CALL instruction is crucial for us in our empirical validation on unchecked low-level calls (see Sect. 5). For our purposes, modeling CALL is somewhat surprisingly straightforward. The instruction causes another contract to be called with optional arguments. If we return from the other contract, we get a return value. From our standpoint, CALL causes a net decrement of 6 to the stack, and the return value that is stored on the top of stack is chosen non-deterministically, because we do not know what value will be returned. In the example above, we have shown the manner in which stack_head changes with a CALL. The following shows the change to an entry in the stack.

figure j

The variable CALL_23_return_value is declared but never assigned a value; the model checker non-deterministically chooses a value for it.

Instructions that Need More than One Transition. For the other instructions, we require more than one transition in our model. Consequently, we need to be careful that any termination condition we specify to the model checker does not match an “intermediate” state because such as state would not exist in the EVM. Two examples of such instructions are JUMP (unconditional jump) and JUMPI (conditional jump). We adopt the mnemonic “_DUMMY_ ” to refer to the intermediate state. Below, we discuss JUMP; JUMPI is realized similarly. The only valid destination for a jump instruction is a byte that has the opcode JUMPDEST. Consequently, in our reduction, we only need to check to which JUMPDEST a particular JUMP or JUMPI seeks to jump in a particular instance. Suppose we have a JUMP instruction in the EVM code at byte #7, and a JUMPDEST at byte #12. Then, our operationName would be declared as:

figure k

For each JUMPDEST_ x, we introduce a boolean variable jump_destination_is_ x. Its value is initialized and changes as follows (as before, we assume that our stack has maximum size 10, indexed \(0,\ldots , 9\)):

figure l

That is, we set the value of jump_destination_is_12 to true if and only if the current operationName is a JUMP, and the value at the head of the stack is the same as the byte number of this JUMPDEST. The reason is that in EVM, JUMP ’s operand, which is the destination of the jump, is the value at the top of the stack. Note also that no two jump_destination_is_ x variables can be simultaneously true. If the instruction is JUMPI instead, we would check also whether the value immediately below the top of stack, i.e., stack[4] in our above example, is \(> 0\), because that is exactly where the condition for the JUMPI resides. We can then carry out the state-transition that effects the changes to operationName to the appropriate JUMPDEST. That is, we effect one state-transition to setup the correct destination for the jump (and check the condition, in the case of JUMPI), and a next state-transition to correctly update the operationName, i.e., our version of the program counter.

figure m

For MSTORE and SSTORE, we use “_DUMMY_ ” variables for a similar reason that we need to setup the location at which we store. The only difference between MSTORE and SSTORE is that the former stores a value at at offset within memory while the latter’s storage space is indexed by a key, often computed by exercising a cryptographic hash function, Keccak-256 or SHA-3. Corresponding to the hash function, EVM has an instruction, byte value 0x20, opcode KECCAK256 [33]. The computation of this hash value has itself been identified in some prior work as a source of difficulty in verifying smart contracts [12]—owever, we observe that while we could certainly implement either of those hash functions in SMV, we have simplified the work by realizing the Adler-32 checksum instead [20]. From our standpoint, there is no consequence except ease of implementation.

Gas. We capture gas in a variable, and use the various values on consumption of gas that are specified for EVM [33].

4.1 Unchecked Low-Level Calls

An unchecked low-level call is an invocation of a function of another contract from this contract. If and when that call eventually returns, a return value is pushed onto the stack of this contract. A contract is vulnerable if and only if there exists an instance of execution in which that return value is not checked. In Solidity, there are a number of ways to call another contract, e.g., call(), send() and staticcall(). In EVM code, there is one way only: the CALL instruction. Also, in Solidity, there are several different ways of checking the return value from such a call; e.g., using an if statement or by invoking the require() convenience function. In EVM code, the check is achieved using the ISZERO instruction. We adopt the algorithm in Fig. 2 to find unchecked low-level calls.

Fig. 2.
figure 2

Our algorithm for finding unchecked low-level calls.

In trim_contract(), we first find each instance of the CALL instruction. We then scan forward in the EVM code from that instruction till we hit an instruction that we consider terminating from the standpoint of our trimming: JUMP, JUMPI, STOP, RETURN, REVERT or another CALL. We expect there to be at least one ISZERO before we hit one of these instructions; if not, we report that the contract is vulnerable in Line (4) of the above algorithm. If we proceed past Line (4) for a particular snippet s, we know that in s, the CALL is followed by at least one ISZERO. Corresponding to each ISZERO in a snippet s, we ask whether it checks the return value from the CALL. Assume that we have an ISZERO in byte # 12 and another in byte # 35 of the snippet. We first generate a random value, 59061 in the example below. We then ask whether there exists a reachable state in which the top of the stack contains that value when we hit any ISZERO instruction that follows the CALL in the snippet. In the example below, the maximum stack-size is 10, with the entries indexed \(0,\ldots ,9\).

figure n

\({\texttt {CALL\_}} x{\texttt {\_return\_value}} \) is a variable we declare and allow the model checker to assign non-deterministically. If the model checker finds a counter example to the above LTLSPEC, then that is evidence that some instance of ISZERO that follows the CALL in the snippet s checks the return value from the CALL, and therefore no vulnerability exists. Otherwise, we know and report that a vulnerability exists. We recognize that there is a small probability of a false negative here because it is possible that the top of the stack takes the random value we generate, 59061 in the above example, not because it corresponds to the return value from the CALL, but on account of some other computations. We can simply repeat to exponentially decrease this probability.

Fig. 3.
figure 3

Table with statistics and our empirical results.

5 Empirical Validation

A model-checker that can take specifications in SMV as input is nuXmv [3], and that is indeed what we have used. We have employed it in bounded model-checking more, where we infer the bound from the size of each contract snippet we check. For our empirical validation, we use the curated and “wild” datasets of Durieux et al. [8]. Their curated dataset comprises smart contracts written in Solidity classified into the 10 categories of DASP [23]. One of these classes is unchecked low-level calls. Each contract in the curated set is labelled where the vulnerability exists. For example, the contract in Fig. 1 from Sect. 1 suffers from an unchecked low-level call in the line winner.send(winAmount).

When one considers the smart contracts from the entire curated set, we know that at least one contract in every file within the unchecked_low_level_calls subset contains the vulnerability. But, there may also be contracts in the curated set outside of that subset that suffer from the unchecked low-level calls vulnerability. Consequently, we organize our discussions below as follows. We first focus on files in the unchecked_low_level_calls subset of the curated set. As every one of those files contains the vulnerability, we cannot report any false positives. The question is whether we report any false negatives. We then consider the remainder of the files from the entire curated set. We report the manner in which our approach performs on them; in this case, both false positives and negatives are possible. Finally, we discuss our assessment on the “wild” dataset.

The table in Fig. 3 reports statistics on the datasets and our results. The first row of the table reports the number of smart contract files in each of the three datasets. The second row reports the number of files we assessed empirically against our implementation: it was all the files from the curated set, and 100 files chosen randomly from the “wild” set. The third row reports the total number of contracts in the Solidity code in each dataset, and the fourth row reports the total number of CALL instructions in the EVM code across all the contracts in the files we assessed. Thus, as we say in the table in the fourth row, we assessed more than 1000 contract snippets in the algorithm we discuss above. The fifth row, “# we determine vulnerable”, is the number of those instances of the CALL instruction that we deem to be vulnerable. The final two rows reports the number of false positives and negatives, which are both zero. Our unit of measurement for the number of false positives and negatives is at the granularity of a file.

Comparison to Other Tools. The work of Durieux et al. [8] allows us to compare our results with prior tools. Their work reports “Vulnerabilities identified per category by each tool,” (Table 5 in that work) from which we get a lower-bound on the number of false negatives for those tools. We observe that for unchecked low-level calls, every tool they studied suffers from a high number of false negatives—the best performer, Mythril [22], was able to detect only 5 of the total 12 instances of vulnerabilities. Our work is on a later, larger version of the dataset used in Durieux et al. [8] and we achieve zero false positives and negatives on the larger dataset.

Other Vulnerabilities. While the above results suggest that for unchecked low-level calls, our approach is highly effective, a natural question is whether we can extend it to address other commonly occurring classes of vulnerabilities in smart contracts. We think that the answer to this question is ‘yes’. For example, it is possible for us to adopt the reduction from Sect. 4 with a different algorithm and LTL property than the ones from Sect. 4.1 to address reentrancy [23]. We leave this for future work.

6 Conclusions and Future Work

We have taken a different mindset than existing work towards checking for security vulnerabilities in smart contracts. Rather than trying to check for several different kinds of such vulnerabilities and as a consequence, building a tool that suffers high numbers of false positive and negatives as has been observed for such tools, we validate the hypothesis that if we focus on a class of commonly occurring vulnerabilities only, we can build a tool that suffers zero false positives and negatives for real-world smart contracts. The class which is our focus is as computationally hard as any other class we may want to check for. Our empirical results on a publicly available benchmark are highly promising—our tool suffers zero false positives and negatives.

Some future work is to extend our reduction to be able to detect other classes of vulnerabilities that are of particular interest in the context of smart contracts, such as reentrancy and front-running [23]. There is also the question as to whether we can detect new classes of vulnerabilities that are not members of known, commonly occurring classes. More broadly, there is the question of identifying characteristics unique to smart contracts from the standpoint of security vulnerabilities in them.