Keywords

1 Introduction

Smart contracts and their decentralized algorithmic validation are emerging as a successful technology to implement agreements between mutually untrusted parties without relying on a centralized third authority. They are currently used in many critical domains, such as infrastructural systems and financial applications, therefore it is of paramount importance to study their correctness. In this work, we address this problem at the programming language abstraction level, so to statically rule out harmful patterns appearing in smart contracts code and support a safer programming discipline. More precisely, we focus on Solidity, the most widely used programming language in Ethereum’s ecosystem, and its type system, that is integrated in the language so to let the compiler statically enforce basic safety properties of smart contracts.

Our first contribution is the formalization of the semantics of the core of the Solidity language, that we call Featherweight Solidity (\(\textsc {FS}\)). The \(\textsc {FS}\) calculus focuses on contract instantiation, typed interaction among deployed contracts and money transfers. Even if important features like gas fees are omitted, the calculus provides a rather compact and clean model of key aspects of smart contract programming. Such a formalization indeed allows one to precisely define the behavior of many Solidity programs, so to describe undesired behaviors and investigate on a way to prevent them. This is a fundamental step for the development of analysis techniques that take advantage of formal methods to verify and reason about the safety properties of smart contracts’ source code, rather than acting at the level of EVM bytecode. Moreover, the formalization style of \(\textsc {FS}\) intentionally highlights the connection between objects and smart contracts, opening the way to adapt the rich theory of OOLs in the context of Solidity.

As a second contribution, we study the type system of \(\textsc {FS}\), in order to clarify, precisely state and, most importantly, prove, Solidity’s claim to be a type-safe language. In particular, we show that the Solidity static type system only detects a class of errors, whereas others are detected at runtime, such as accesses to a non existing function or state variable, or transfers to contracts that cannot accept money. That is, well-typed \(\textsc {FS}\) programs, hence also compiled Solidity contracts, may reach specific exceptional states that cause the current transaction to be interrupted and rolled-back, possibly leading to Ether indefinitely locked into a contract’s balance. Reverting an unsafe transaction guarantees the consistency of the blockchain, but the account that issued the transaction is not reimbursed for the money it paid to the miner node. Thus, it is of interest of anyone to issue a transaction only when there is a static guarantee that such a transaction will not evolve to a revert.

The main reason for the weakness of Solidity’s type safety lies in the fact that the code of contract functions can refer to contract instances through their public addresses, but the Solidity \(\texttt {address}\) type is essentially an untyped pointer, which is notoriously a very flexible but subtle feature. The newly released Solidity 5.0 version splits the address type so to distinguish contracts that can safely accept money transfers from those that would raise an exception. However, the new compiler is not able to statically prevent subtle workarounds, thus it is not type-safer than its previous version.

Our third contribution is a proposal for a safer refinement of the Solidity type system. We show that the enriched type system enjoys a stronger soundness property, so that the only possible runtime errors in \(\textsc {FS}\) remain those due to a negative account balance. In particular, cast expressions or money transfers that would lead to unsafe usage of contract members or calls to an undefined fallback function are now ruled out at compile-time. Moreover, we show that such a refinement can be actually made retro-compatible with original Solidity code, both 5.0 and previous versions. Hence, it is possible for contracts written in the extended safer language to interact with already deployed smart contracts.

The key idea is twofold: first, we refine the \(\texttt {address}\) type with type information about the contract it refers to. Secondly, we enrich the functions’ signatures so to allow functions to be called only by contracts whose address has an expected (super-)type. This additional information is particularly useful within contracts code to typecheck the implicit sender parameter, therefore, besides statically preventing runtime errors, the refined compiler statically prevents unsafe callback expressions, that are notoriously vulnerable Solidity programming patterns. To take advantage of the full power of the refined typing, the major effort required to Solidity programmers is to explicitly express the type constraint they require on contracts callers. However, this requirement actually supports a safer programming discipline, and we put forward a number of convenient function modifiers, in line with Solidity language style, so to enhance the use of its compiler as a convenient building tool.

Fig. 1.
figure 1

A simple Bank contract in Solidity

2 Background

Ethereum [5] is a decentralized platform that runs programs called smart contracts. Contract instances deployed on the Etherum blockchain are autonomous agents reminiscent of class-based objects in distributed OOLs. They are identified by a unique public address, hold an amount of virtual coins called Ether (balance), are given a persistent area in the blockchain where their state is stored, and are associated with their immutable executable code. Besides contracts, the blockchain also hosts Externally Owned Accounts (EOAs), that correspond to human agents registered to the Ethereum platform. Analogously to smart contracts, EOAs are identified by a unique address and hold an amount of Ether as their balance, but they have no associated code. EOAs start programs by issuing a transaction, which either deploys a new contract instance, or invokes a function by sending a message to a contract stored at a given address. Typically, transactions include input data for the invocation, the address of the sender EOA, an amount of virtual money to be transferred to the contract as a sort of payment, and a fee (gas) to reward the miner node that executes the transaction.

While EOA’s initial transactions are written using one of the many API available in Ethereum ecosystem, smart contracts code is commonly written using the Solidity programming language [1], and it is compiled into bytecode running on the Ethereum Virtual Machine (EVM) [14]. As in OOLs, Solidity contracts contain state variables and functions that, as objects methods, can refer to the currently executing contract instance through the variable \(\texttt {this}\). Contract functions can send messages to other (or to the current) contracts, possibly also specifying an amount of virtual money and the gas fee to be paid. Therefore, besides \(\texttt {this}\), in Solidity the contract functions have access to the implicit variable msg, which stores various information about the current call, such as the address of the caller ( ) and the amount of money sent along with the call ( ).

As an example, Fig. 1 shows a Solidity smart contract that implements a very simple bank. The amounts state variable is a mapping that records the amounts of money deposited by clients, either EOAs or smart contracts, indexed by their Ethereum addresses. To withdraw money from a Bank instance b, the invocation of the corresponding function should have the standard shape b.withdraw(n). The function body first of all checks whether the caller’s bank account contains enough money. If not, an exception (revert) is thrown by the runtime and the current transaction is rolled-back, leaving the blockchain as if it had never run. If the caller has enough money, then its bank account is decremented by n, and, moreover, n Wei’s (Ether’s smallest sub-currency) are transferred from the balance of b to the balance of the caller by explicitly using the transfer primitive. Whenever the caller is a contract, the EVM requires that contract to contain a definition for the so-called fallback function, otherwise a revert is thrown and the transaction is reverted. The typical purpose of such function is either to track the reception of Ether or to refuse it by throwing an exception. On the contrary, when the recipient of the transfer is an EOA, no fallback function is needed.

The invocation of the deposit function, instead, can have the special shape , binding n to the implicit parameter . If not specified, n is assumed to be 0. As a consequence, n Wei’s are transferred from the balance of the caller (msg.sender) to the balance of the Bank instance; in this case, no explicit invocation of transfer is needed. The state variable amounts is updated accordingly. Analogously to the case above, the caller must hold enough money in its balance. The additional value argument can only be specified for a function with the payable modifier, meaning that it is allowed to receive Ether as part of the invocation, otherwise a revert would be thrown at invocation time.

3 The Featherweight Solidity Calculus

In this section we introduce Featherweight Solidity (\(\textsc {FS}\)), a calculus formalizing the core of the Solidity programming language. Many features are omitted, like low level calls (using the primitives send, call, delegatecall), expressive value types like mappings and first-class function values, function modifiers and multiple inheritance. Furthermore, FS models single transactions, thus it does not deal with the concepts of blocks, distributed block validation, and roll-back of the changes to the blockchain caused by a reverted transaction. We also do not model the concept of gas fees, which is a mechanism Ethereum uses to make sure that every transaction eventually terminates and to prevent denial of service attacks.

In such way, we can focus on key aspects of smart contract programming, such as contract instantiation, interactions among deployed contracts, and money transfers, providing a rather compact and clean model of such features. In particular, the definition of \(\textsc {FS}\) is inspired by Featherweight Java (\(\textsc {FJ}\)) [9], the reference calculus for Java-like languages, exploiting the similarities and highlighting the differences between the notions of object and smart contract. Therefore it opens the way to reuse and adapt the rich and well-known theory of OOLs in the context of smart contracts.

Fig. 2.
figure 2

\(\textsc {FS}\): syntax

Syntax and types are given in Fig. 2. We assume sets of variables \( x \), \( y \), contract names \( C \), \( D \), state variable names \( s \), function names \( f \), addresses a. We assume three special variables \(\texttt {this}\), , , a special contract name \(\texttt {Top}\), and a special function name \(\texttt {fb}\), all explained below. We let a metavariable ending by \( s \) to be implicitly defined as a (possibly empty) sequence, for example \( cds \) is defined by \( cds \,{:}{:}\!\!=\epsilon \mid cd \ cds \), where \(\epsilon \) denotes the empty sequence.

A contract table is a sequence of contract declarations, consisting of contract name, parent contract’s name, a sequence of state variable declarations and a sequence of function declarations. We only model single inheritance and assume a distinguished contract name \(\texttt {Top}\) with no state variable and function declarations. A function declaration consists of a return type, a function name, a list of typed parameters, and a body which is an expression. Since \(\textsc {FS}\) does not model Solidity’s function modifiers, every function is implicitly marked payable and external, that is, can receive Wei and can be invoked by EOAs’ transactions. We assume a special function name \(\texttt {fb}\), which models the fallback function, implicitly invoked whenever money is transferred by means of a transfer call. Therefore, if present in a contract definition, the function \(\texttt {fb}\) must be necessarily declared as . As in \(\textsc {FJ}\), we assume for each contract declaration a canonical constructor.

Expressions includes variables, the only constant \(\texttt {u}\) of type \(\texttt {unit}\), natural constants \({\textsf {n}}\) of type \(\texttt {uint}\), addresses, used to refer to EOAs and contracts already deployed in the blockchain, access and assignment to a state variable, and block consisting of a local variable declaration and a body. We use as an abbreviation for with \( x \) not free in \( e '\).

The expression invokes the function f on the contract instance denoted by e, specifying the address \( e ^\textsf {s}\) of the contract instance (or the EOA) that invoked the function, and the amount \( e ^\textsf {v}\) of Wei sent along with the call. In the instantiation expression , the two additional arguments have an analogous meaning.

Assuming that \( e \) evaluates to a contract instance, returns its address, while, assuming that \( e \) evaluates to an address, returns its current balance, and the cast expression \( C ( e )\) returns the corresponding contract instance. The expression , assuming that \( e \) evaluates to an address, transfers the amount of Wei denoted by \( e ^\textsf {v}\) from the balance of \( e ^\textsf {s}\) to its balance. Finally, the \(\texttt {revert}\) expression aborts the current transaction. For the aims of our formalization, we add a label \(\lambda \) describing the specific error (\({\texttt {neg}}\) when a money transfer would make an account’s balance negative, \(\texttt {rte}\) for a runtime type error), omitted when not significant.

Fig. 3.
figure 3

A simple Bank contract in FS

For simplicity, \(\textsc {FS}\) expressions model both Solidity code, that is, smart contracts code, and external code issuing the initial transactions. However, only the latter requires an explicit sender argument in function calls, contract instantiation and money transfer, whereas, in contracts code (that is, in function bodies rather than at top level), the (implicit) sender is always the currently executing contract instance. Formally, we assume that function calls occurring in function bodies have shape , abbreviated , and analogously for constructor invocations and transfer.

The syntax of types includes contract names, the \(\texttt {unit}\) type, the type \(\texttt {uint}\) of unsigned integers, and the type of addresses. In the definition of \(\textsc {FS}\) we privileged uniformity, therefore a \(\textsc {FS}\) program is not an executable Solidity program, for instance, Solidity has no \(\texttt {unit}\) type. However, the correspondence is very close. In Fig. 3 we showFootnote 1 the \(\textsc {FS}\) code corresponding to the Solidity smart contract in Fig. 1. As an example, denotes a transaction issued by the EOA with address ’0xu7e’ to interact with an instance of the Bank contract stored at address ’0x84b’.

Fig. 4.
figure 4

\(\textsc {FS}\): operational semantics

Operational Semantics. Runtime expressions include, besides source-level constructs in Fig. 2, contract references \({\iota ^{ C }_{ D }}\), where \( C , D \) are contract names. We write \(\iota _{ D }\) as abbreviation for \({\iota ^{ D }_{ D }}\), and omit both when not relevant. When a contract \( D \) is instantiated, a new reference \(\iota _{ D }\) is created with its contract’s name built in (as subscript). When a cast to type \( C \) occurs at runtime, no type check is performed by the EVM, but the execution proceeds by recording the target type (as superscript) in the contract reference. That is, \({\iota ^{ C }_{ D }}\) is a reference to an instance of contract \( D \) (for “dynamic type”) that has been cast (that is, statically typed) to type \( C \) (for “cast type”). Note that a contract reference keeps its dynamic type forever, whereas it can be used with different cast types.

Configurations, ranged over by \( c \), are pairs \(\langle {{ e },{\beta }}\rangle \), where \( e \) is the expression to be evaluated and \(\beta \) the blockchain, that stores the global state of the system. Formally, \(\beta \) is finite map from contract instances of shape \(\langle {{\iota _{}},{ a }}\rangle \) to pairs \(\langle {{ vs },{{\textsf {n}}}}\rangle \), where \({\textsf {n}}\) and \( vs \) are the contract’s balance and state, respectively, the latter being the tuple of the current values of the state variables. Note that, while the reference records type information, the public address provides an “untyped” way to access a contract instance. As in Ethereum, we assume a one-to-one correspondence between references and addresses in the domain of \(\beta \), so we can safely use the notations \(\beta (\iota _{})\) and \(\beta ( a )\) as abbreviations for \(\beta (\langle {{\iota _{}},{ a }}\rangle )\). Since Ethereum blockchain records also EOAs addresses and balances, in order to let the map \(\beta \) uniformly deal with both smart contracts and EOAs, we assume that each EOA has a corresponding reference to an instance of a dummy contract \(\texttt {EOContract}\) whose code only contains a fallback function \(\texttt {fb}\) with empty body (i.e., unit fb {return u;}). Values are contract references and constants of the other types.

Evaluation contexts formalize standard left-to-right evaluation (for brevity we do not explicitly list all cases).

The small-step reduction relation over configurations \(\longrightarrow _ CT \) is parameterized by a contract table, omitted to lighten the notation. Reduction rules are collected in Fig. 4, where we use the following notations, whose trivial formal definition is omitted. Given a blockchain \(\beta \): in \(\beta [\iota _{}.i{\texttt {=}} v ]\), the value of the i-th state variable of the contract instance \(\iota _{}\) has been replaced by \( v \); in \(\beta [\langle {{\iota _{}},{ a }}\rangle {\leftarrow }{ vs }]\), a new contract instance \(\langle {{\iota _{}},{ a }}\rangle \) has been added with state \({ vs }\) and balance 0; in , an amount of \({\textsf {n}}\) Wei has been transferred from the balance of the contract instance at address \( a \) to that at \( a '\). If \(\beta (\langle {{\iota _{}},{ a }}\rangle )=\langle {{ v _1\ldots v _n},{{\textsf {n}}}}\rangle \), then we write to denote \( v _i\), for \(i\in 1..n\), and we write to denote \({\textsf {n}}\). The expression \( e [ v / x ]\) is obtained from \( e \) by replacing all occurrences of \( x \) by \( v \). The following auxiliary functions are implicitly parameterized on the contract table: \(\textsf {svar}( C , s )\) and \(\textsf {svartype}( C , s )\) return the index and type, respectively, of the state variable \( s \) in \( C \), if any; \(\textsf {svars}( C )\) returns the sequence of all (inherited and directly declared) state variables of \( C \); \(\textsf {ftype}( C , f )\) and \({\textsf {fbody}( C , f )}\) return the function type, of shape \(\langle {{ T },{ T _1\ldots T _n}}\rangle \), and the pair parameters-body, respectively, of the function \( f \) in \( C \), if any, looked for in \( C \) first, then in its parent contract. Finally, the subtyping relation \( C \le D \) is the reflexive and transitive closure of the inheritance relation. Subtyping is extended to function types as usual: \(\langle {{ T },{ T _1\ldots T _n}}\rangle \le \langle {{ T '},{ T '_1\ldots T '_n}}\rangle \) if \( T '_i\le T _i\), for \(i\in 1..n\), and \(T\le T '\).

Rules (ctx) and (ctx-revert) are straightforward. In rule (access) the semantics is as expected, with an additional check that the state variable s exists in both contracts C and D and that the type obtained at runtime is a subtype of that statically computed from the cast type. Otherwise, a \(\texttt {revert}[\texttt {rte}]\) is raised, see rule (access-rte), whose side condition is intended to cover also the case where s is not defined in both contracts. A symmetric check is performed in rules (assign) and (assign-rte). Indeed, as mentioned above, in Solidity no runtime checks are performed in a cast, but they are postponed when the reference is actually used.Footnote 2 This is modeled in rule (cast), where an address is converted to the corresponding reference. The runtime effect is just to tag the reference with the static type for future usage checks; in particular no subtyping constraint, like \( D \le C \), is enforced. In rule (dec), local variable declarations have the standard substitution semantics. Rules (get-addr) and (get-bal) are straightforward. In rule (new), a fresh instance of \( C \) is added in the blockchain, with state and balance initialized to the tuple of values and amount provided as arguments of the constructor invocation. Furthermore, the balance of the contract instance at address \( a ^{\textsf {s}}\) is decremented of the same amount, provided that this would not make the balance negative, otherwise a \(\texttt {revert}[{\texttt {neg}}]\) is raised, see rule (new-neg).

In rule (invk), the parameters and body of the function \( f \) defined in the contract of the receiver are retrieved from the contract table, through the auxiliary function fbody. Analogously to rules (access) and (assign), a check is performed that the function f exists in both contracts C and D and the function type obtained at runtime is a subtype of that statically computed from the cast type, otherwise a \(\texttt {revert}[\texttt {rte}]\) is raised, see rule (invk-rte). The invocation is reduced to the function body where \(\texttt {this}\) and formal parameters have been replaced by the receiver and the arguments \( vs \), as in standard \(\textsc {FJ}\), and, moreover, and have been replaced by address \( a ^{\textsf {s}}\) and amount \({\textsf {n}}\), respectively. Finally, the balance of the contract instance at address \( a ^{\textsf {s}}\) is decremented of the same amount, provided that this would not make the balance negative, otherwise a \(\texttt {revert}[{\texttt {neg}}]\) is raised, see rule (invk-neg). While functions are invoked on contract references, the transfer construct is used with addresses. In rule (transfer), an amount of \({\textsf {n}}\) Wei is transferred from the balance of the contract instance at address \( a ^{\textsf {s}}\) to that at \( a \), provided that this would not make the sender balance negative, otherwise, a \(\texttt {revert}[{\texttt {neg}}]\) is raised, see rule (transfer-neg). Moreover, the fallback function is implicitly invoked, if any, otherwise a \(\texttt {revert}[\texttt {rte}]\) is raised, see rule (transfer-fb).

4 Type System

The typing judgment has shape \(\varGamma ;\mathcal{I};\mathcal{A}\vdash e : T \), where \(\varGamma \) is a finite map from variables to types, \(\mathcal{I}\) and \(\mathcal{A}\) are sets of references and addresses, respectively. As for the reduction relation, it is implicitly parameterized by a contract table.

Typing rules are given in Fig. 5; they are mostly straightforward. Note that rule (t-ref) assigns the static type of a contract reference by looking at its superscript. According to the semantics of cast, rule (t-cast) just checks that the expression to be cast has type \(\texttt {address}\) without performing any additional type check. The typing judgment is extended to configurations in rule (t-conf), requiring that both the expression to be evaluated and the blockchain are well-typed according to the same sets of addresses and contract references. Moreover, the expression should contain no free variables. The judgment \(\mathcal{I};\mathcal{A}\vdash \beta \) holds if:

\( a \,{\in }\,\mathcal{A}\) iff \( a \,{\in }\,\textsf {dom}(\beta ),\ \) \(\iota _{}\,{\in }\,\mathcal{I}\) iff \(\iota _{}\,{\in }\,\textsf {dom}(\beta ),\) and \(\beta (\iota _{ C })=\langle {{ v _1,..., v _n},{{\textsf {n}}}}\rangle \) implies \(\textsf {svars}( C )= T _1\, s _1... T _n\, s _n\) and, for all \({i\in } 1..n\), \(\emptyset ;\mathcal{I};\mathcal{A}\vdash v _i: T '_i\) with \( T '_i\le T _i\).

Fig. 5.
figure 5

\(\textsc {FS}\): typing rules for expressions and configurations

Finally, the judgment \(\mathcal{I};\mathcal{A}\vdash CT \) means that the contract table is well-formed w.r.t. existing contract references and addresses. We omit the complete formal definition, reported in [6], since it is essentially as in \(\textsc {FJ}\). Informally, it ensures that all used contract names are declared, the inheritance relation is acyclic, there is no state variable hiding, no function overloading and safe function overriding. Moreover, each function definition should be well-typed in the following sense: if \(\textsf {ftype}( C , f )=\langle {{ T },{ T _1\ldots T _n}}\rangle \) and \({\textsf {fbody}( C , f )}=\langle {{ x _1... x _n},{ e }}\rangle \) then \(\texttt {this}{:} C ,\, \texttt {msg.sender}{:}\texttt {address},\, \texttt {msg.value}{:}\texttt {uint}, x _1: T _1,.., x _n: T _n ;\emptyset ;\mathcal{A}\vdash e : T '\) with \( T '\le T \). Notice that the previous judgment assumes an empty set of references since the code of contract functions can refer to contract instances and EOAs only by means of (public) addresses.

We write \(\longrightarrow ^\star \) for the reflexive and transitive closure of \(\longrightarrow ^{}\) and if there is no \( c '\) s.t. \( c \longrightarrow ^{} c '\). In the theorem we implicitly assume that the underlying class table is well-formed w.r.t. \(\mathcal{I}\) and \(\mathcal{A}\).

Theorem 1

(Soundness). If \(\mathcal{I};\mathcal{A}\vdash c : T \), \( c \longrightarrow ^\star c '\), and , where \( c '=\langle {{e'},{\beta '}}\rangle \), then either \( e '\) is a value or \( e '=\texttt {revert}[\lambda ]\) for some \(\lambda \).

This soundness theorem states that the Solidity type system prevents stuck execution, but not runtime type errors. This is quite dangerous and can lead to Ether indefinitely locked into a contract or to unexpected runtime \(\texttt {revert}\)s.

For instance, consider a blockchain storing at address \( a _B\) an instance of the Bank contract in Fig. 3, and at address \( a _D\) an instance of a contract D that does not define a fallback function. Assume that the contract at \( a _D\) successfully deposited 100 Wei in the bank, and now wants to withdraw part of them. The function call successfully compiles, but reduces to that raises a \(\texttt {revert}[\texttt {rte}]\) exception since \( a _D\) refers to a contract that does not define a valid fallback function. Therefore, the withdrawal transaction aborts, causing loss of gas fee in the real Ethereum scenario. Moreover, since deployed contract code cannot be updated, the money already deposited by \( a _D\) in the bank \( a _B\) is indefinitely locked.

The whole problem lies in the way the \(\texttt {address}\) type is handled: neither Solidity nor the EVM provides additional information on the contract stored at that address. Solidity addresses represent an untyped way to access contract instances, much as void * pointers in C. Such pointers allow extreme flexibility, but they are really difficult to deal with, since programmers have to know what they are doing and how to do so, in order to avoid subtle bugs.

5 Refined Type System

This section refines the type system of \(\textsc {FS}\) in order to more safely access contract instances through their address. Indeed, the resulting type system enjoys a more powerful soundness property, that is, well-typed programs never reduce to a \(\texttt {revert}[\texttt {rte}]\) exception. The key idea is to enrich the \(\texttt {address}\) type so to type information about the contracts the addresses refer to. That is, \(\texttt {address}\langle C \rangle \) is the type of the addresses of instances of the contract \( C \). This richer type is mostly useful when typing the implicit variable, used in function bodies to refer to the address of the caller. Indeed, well-typed expressions such as or reduce to a \(\texttt {revert}[\texttt {rte}]\) exception if is bound to the address of a contract that has not type \( C \) or has no fallback function. Moreover, by enriching the contract functions’ signatures with the address type of the implicit sender parameter, we can let the compiler check the safety of callbacks expressions similar to the ones above, that are notoriously vulnerable Solidity programming patterns.

Formally, the refined calculus, called \(\textsc {FS}^+\), is obtained by applying the changes in Fig. 6 to the syntax of \(\textsc {FS}\). In function declarations, the metavariable \( S \) (for “sender”) ranges over contract names, and the meaning is that the function f can be called only by contracts or EOAs whose address has (a subtype of) type \(\texttt {address}\langle S \rangle \). The subtyping relation is extended to address types in covariant way, that is, \(\texttt {address}\langle C \rangle \le \texttt {address}\langle D \rangle \) holds if \( C \le D \).

Fig. 6.
figure 6

\(\textsc {FS}^+\): changes to syntax and typing rules

The typing rules of \(\textsc {FS}^+\) are obtained by applying the changes in Fig. 6. Moreover, in the typing judgement, \(\mathcal{A}\) is no longer a set, but a map from addresses to contract names. The judgment \(\mathcal{I};\mathcal{A}\vdash \beta \) must additionally require that if \(\langle {{\iota _ C },{a}}\rangle \in \textsf {dom}(\beta )\) then \(\mathcal{A}(a)= C \). Finally, function types become triples, \(\langle {{ T },{ S },{ T _1\ldots T _n}}\rangle \le \langle {{ T '},{ S '},{ T '_1\ldots T '_n}}\rangle \) additionally requires \( S '\le S \), and the requirement on well-formedness of function bodies becomes the following: if \(\textsf {ftype}( C , f )=\langle {{ T },{ S },{ T _1\ldots T _n}}\rangle \) and \({\textsf {fbody}( C , f )}=\langle {{ x _1... x _n},{ e }}\rangle \) then \(\texttt {this}{:} C , \texttt {msg.sender}{:}\texttt {address}\langle S \rangle , \texttt {msg.value}{:}\texttt {uint}, x _1: T _1,.., x _n: T _n;\emptyset ;\mathcal{A}\vdash e : T '\) with \( T '\le T \). The refined rule (t-cast) now statically checks that the expression to be cast evaluates to the address of an instance of contract \( D \) which is a subtype of the target of the cast. In rules (t-invk) and (t-transfer), the additional side condition requires the type of the sender \( e ^\textsf {s}\) to be a subtype of the type S of the expected caller of the function f and \(\texttt {fb}\), respectively, as specified in their refined signature.

The type system of \(\textsc {FS}^+\) enjoys a stronger soundness property: \(\texttt {revert}[\texttt {rte}]\) errors are statically prevented, so the only possible runtime errors remain those due to a negative account balance. In other terms, cast expressions or money transfers that would lead to unsafe usage of contract members or calls to an undefined fallback function are now ruled out at compile-time.

Theorem 2

(Soundness). If \(\mathcal{I};\mathcal{A}\vdash c : T \), \( c \longrightarrow ^\star c '\), and , where \( c '=\langle {{ e '},{\beta '}}\rangle \), then either \( e '\) is a value or \( e '=\texttt {revert}[{\texttt {neg}}]\).

By taking advantage of this more powerful typing, the Bank contract in Fig. 3 can be refined into the following safer smart contract:

figure e

We assume a contract \(\mathtt {Top_{fb}}\) which only contains a \(\texttt {fb}\) function with empty body and \(\texttt {Top}\) sender parameter. Address types used in the mapping to index the banks’ clients refer to such contract name. This type is also used in the refined signature of the two contract functions, so to (statically) ensure that their caller contract actually provides a fallback function. Therefore, coming back to the example discussed in Sect. 4, if \(a_B:\texttt {address}\langle \texttt {Bank}\rangle \) and \(a_ D :\texttt {address}\langle D \rangle \) where the contract \( D \) has no fallback function, the function call does not compile anymore, since the new rule (t-invk) requires \(D\le \mathtt {Top_{fb}}\), which is not true. The runtime error occurring when trying to tranfer money to \(a_D\) is then statically prevented. Similarly, the contract stored at \(a_D\) cannot even call the deposit function, thus preventing also the deposit of money that cannot be withdrawn anymore. We remark that the type \(\texttt {address}\langle \mathtt {Top_{fb}}\rangle \) has the same meaning of Solidity 5.0’s new type \(\texttt {address}\,\,\texttt {payable}\), that is the (super)type of every contract that can safely accept money transfers. However, in Solidity 5.0 the variable is always assumed to be of type \(\texttt {address}\,\,\texttt {payable}\), and no check is performed at compile time to ensure that the actual sender has a fallback function. Therefore, differently form \(\textsc {FS}^+\), the Solidity compiler 5.0 does not enjoy a better soundness property that of Sect. 4.

The introduction of the type \(\texttt {address}\langle C \rangle \) and the corresponding typing rules, are of course incompatible with the legacy Solidity code, that would not be accepted anymore by the new compiler. Nonetheless, a direct default mapping is easily definable by mapping each occurrence of the type \(\texttt {address}\) to \(\texttt {address}\langle \texttt {Top}\rangle \) and by refining each function signature so to use \(\texttt {Top}\) as supertype of the function’s sender. We shall also provide a flag in the new compiler to disable the refined rule (t-cast) when \(D=\texttt {Top}\) and use the standard rule (t-cast) of Sect. 4. Indeed, the refined rule would rule out any cast having \(\texttt {address}\langle \texttt {Top}\rangle \) as actual type of e, since for all type C, \(\texttt {Top}\not \le C\). Cleary, by using such a default mapping, no additional guarantees can be statically checked on the contracts code, however, retro-compatibility with Solidity contracts already deployed on the blockchain, whose code cannot be updated, is guaranteed.

To take advantage of the full power of the refined typing, the major effort required to Solidity programmers is to annotate each function with the required (super)type of the caller. We then put forward a couple of new convenient annotations, in line with the Solidity programming style, that provides a number of modifiers to annotate functions, e.g., the payable marker in Fig. 1. Since it is often the case that type constraints refer to contracts that provide (at least) a fallback function, the keyword payableaddress can be introduced as a syntactic sugar for the type \(\texttt {address}\langle \mathtt {Top_{fb}}\rangle \), and the function marker payback can be used to indicate that the function potentially sends Ether back to its caller. Therefore, the Solidity Bank contract given in Fig. 1 could be simply rewritten into the following code, where function bodies are as in Fig. 1:

figure g

Instead, to enforce type-safe callbacks in functions code, programmers are required to explicitly express the type constraint they require on contracts callers. However, this requirement actually supports a safer programming discipline.

6 Conclusions and Related Work

We developed semantic foundations of smart contract programming, by formalizing the core of the Solidity language and type system. The \(\textsc {FS}\) calculus allows one to precisely define the behavior of smart contract programs and clarifies the type soundness of the Solidity compiler, pointing out its limitations. Thus it represents a fundamental building block to develop automatic program analysis tools. We then put forward a refined type discipline that statically captures a larger class of errors, such as unsafe casts, unsafe callbacks and unsafe money transfers. We discussed how such extension provides a safer programming discipline that is retrocompatibile with smart contracts already deployed on the blockchain. Finally, the \(\textsc {FS}\) calculus highlights the connection between objects and smart contracts, thus opening the way to reuse the type theory of OOLs in the context of Solidity, and dually to adapt the refined typing of \(\textsc {FS}^+\) to the case of distributed objects.

A number of proposals have been developed to improve the security and correctness of Ethereum smart contracts. A stream of works, e.g., [3, 7, 8], addresses the problem at the bytecode level: the semantics of EVM bytecode is formalized and smart contracts properties are verified by means of static analysis tools operating on the corresponding bytecode. Among the ones addressing the problem at the programming language level, Zeus [10] translates Solidity code into LLVM bytecode [11], leveraging abstract interpretation and symbolic model checking analysis techniques. SmartCheck [13], instead, attempts to detect vulnerabilities representing Solidity code as an XML tree, and then running XPath queries on it. Contracts code is fully covered, but the use of XPath leads to a higher rate of false positives. However, these tools are based on limited formal foundations of the language they operate on, and they come into play when a contract is fully defined. We rather think that by enhancing the Solidity compiler’s ability to statically rule out harmful code, we support a safer programming discipline, where programmers can write smart contracts that are (more) correct by construction. The work presented in [2] operates in this direction, and provides a preliminary compiler extension encoding Solidity code into SMT formulas to check simple properties, such as the division by zero. Similarly, the tool developed in [12] encodes a subset of Solidity into SMT formulas and uses symbolic model checking to verify some properties about smart contracts behaviour, including temporal ones. The first attempt to formalize Solidity is presented in [4]. In this work a small subset of Solidity is translated into F*, whose type system is afterwards used to detect vulnerable patterns, such as reentrancy. Even though the results are encouraging, the subset of Solidity is too small (neither transfer or cast expressions are considered), and an external language, F*, is used. To the best of our knowledge, this paper, together with its preliminary version [6], is the first work aiming at directly formalizing the semantics and the type soundness of the Solidity source code, so to enhance the use of its compiler as a convenient building tool.