Keywords

1 Introduction

Bitcoin [Nak08] is the first fully decentralized digital currency introduced in 2008 and launched in 2009. It innovatively combines cryptographic techniques with economic incentives to make rational participants likely to play by the rules. Bitcoin gained significant traction, reaching $80 billion market capitalization in September 2017. Hundreds of alternative cryptocurrencies based on similar general design have appeared since Bitcoin’s launch. Programming languages in early blockchains, e.g., the Bitcoin scripting language, were deliberately limited to reduce complexity for the sake of security.

Ethereum [VB+14, Woo14], announced in 2014 and launched in 2015, aims at creating a universal blockchain-based application platform. It incorporates a Turing complete language, making it theoretically possible to express all practical computations in smart contracts – pieces of code permanently stored on the blockchain and capable of responding to users’ requests. This enhanced functionality introduces new security challenges related to language design and secure programming practices.

Ethereum is not the only smart contract blockchain system [BP17]. Ethereum Classic [Eth17c] is an alternative blockchain originating from a controversial Ethereum update. Rootstock [Roo17] and Qtum [Qtu17] aim at implementing smart contracts in combination with the Bitcoin blockchain. Chain [Cha17a], Corda [Cor17], and Hyperledger [Hyp17] propose permissioned (i.e., with a fixed set of approved participants) smart contract blockchains, designed to simplify transactions between corporate entities.

This paper focuses on Ethereum as the most mature open blockchain with Turing complete programming capabilities. We summarize the state of knowledge and outline the research perspectives in this rapidly developing field. We assume familiarity with the basic blockchain concepts; [BMC+15, TS15] provide the necessary background.

2 Technical Overview

2.1 State and Accounts

Ethereum can be thought of as a state machine. Nodes of the Ethereum peer-to-peer network maintain a shared view of the global state. A user interacts with the network by issuing a transaction representing a valid state transition. Nodes pick transactions from the mempool (the set of unconfirmed transactions), verify their validity, perform the corresponding computation (possibly changing ownership of units of the Ethereum native cryptocurrency ether), and update the state. There are two types of accounts in Ethereum: externally owned accounts and contract accounts controlled by a private key or by a smart contract – a piece of code deployed on the blockchain – respectively.

The account state consists of the following fields:

  • nonce – the number of transactions sent by this account (for externally controlled accounts) or the number of contract creations made by this account (for contract accounts);

  • balance – the number of wei Footnote 1 owned by this account;

  • storageRoot – Merkle Patricia tree root of this account’s storage;

  • codeHash – hash of this account’s contract bytecode.

Accounts’ 160-bit addressesFootnote 2 are derived from its public key or, in case of contract accounts, from the address of the contract’s creator and its nonce [eth16]. The global state maps addresses to account states. The primary data structure in Ethereum is the Merkle Patricia tree – a radix tree optimized for key-value mappings with 256 bit keys [VBR+17, Buc14]. The root hash authenticates the whole data structure. Values pairs are editable in logarithmic time.

The Ethereum state model (accounts and states) differs from than in Bitcoin. The Bitcoin blockchain stores unspent transaction output (UTXO); balances of addresses are calculated off-chain by wallet software.

2.2 Transactions and Gas

The halting problem – determining if a given program will ever halt – is unsolvable in the general case [Chu36]. This poses a challenge: nodes running the Ethereum virtual machine (EVM) cannot foresee the amount of resources required for validating a transaction, which enables denial-of-service attacks.

To overcome the issue, the Ethereum protocol incorporates a pricing mechanism. It makes resource-intensive computations in smart contracts economically infeasible. Every computational step in EVM is priced in units of gas. EVM opcodes and their gas costs are defined in the Yellow paper [Woo14]. The price of a gas unit in ether is determined by the market. For every transaction, the sender specifies the maximum amount of gas that the intended computation is expected to consume (the gas limit) and the price the user wishes to pay per unit of gas (the gas price). The transaction fee equals the gas limit multiplied by the gas price. If the execution is successful, the remaining ether is refunded. If an error occurs, the transaction has no effect on the state, but all provided gas is consumed. Miners can vote to gradually change the limit on the total amount of gas consumed in a block [jnn15].

A transaction is a signed data structure comprising a set of instructions to be atomically executed by the EVM. It consists of the following fields:

  • nonce – the number of transactions sent by the sender;

  • gasPrice – the number of wei per gas unit that the sender is paying;

  • gasLimit – the maximum amount of gas to be spent during execution;

  • to – the destination address (0x0 for contract creation transactions);

  • value – the number of wei transferred along with the transaction;

  • v, r, s – signature data.

There are two types of transactions in Ethereum. A contract creation transaction is used to deploy a new contract. It contains an additional init field that specifies the EVM code to be run on contract creation, as well as the EVM code of the new contract. A message call transaction is used to execute a function of an existing contract (with arguments specified by the an optional data field) or to transfer ether.

2.3 Block Structure and Mining

Ethereum uses proof-of-work (PoW): nodes compete to find a partial collision of a cryptographic hash function and produce the next blockFootnote 3. Both Bitcoin [Wui17] and Ethereum [Joh17] chose the heaviest chain as a valid one in case of forks, where a chain’s weight is defined as the sum of its blocks’ difficulties.

Good connectivity is crucial for Bitcoin mining operation: the resources spent mining on a block other than the latest one are essentially wasted. Good connectivity puts big pools at an advantage, while blocks from worse connected miners propagate slowly and increase the orphan rate. Thus Bitcoin mining is prone to centralization. To be able to operate with block times much shorter than Bitcoin’s 10 min (about 30 s in September 2017), Ethereum uses a mining protocol [doc17] similar to GHOST [SZ13]. Ethereum considers uncles – valid orphan blocks that are ancestors of the current block (no more than 6 generations deep). For each block, the miner receives a static reward of 5 ether, payments for the gas consumed by transactions in the block, and 1/32 of the static reward (0.15625 ether) per uncle, whose hash is included in the block header (no more than 2 uncles per block). Miners of uncles whose headers get included in the main chain receive 7/8 of the static reward (4.375 ether). Due to uncles, the energy spent on orphan blocks contributes to security, increasing the amount of work required for a double-spend.

Contrary to Bitcoin, where coins are issued on a diminishing rate with a total cap of 21 million, Ethereum issues ethers at a constant rate with no total cap. Ethereum’s issuance parameters may change after switching to proof-of-stake (see Sect. 3.1).

Bitcoin PoW uses a general purpose cryptographic hash function SHA-256, which can be efficiently implemented in hardware. Specialized mining equipment (application-specific integrated circuits, ASIC) is orders of magnitude more efficient than commodity hardware, which puts small miners at a disadvantage. Ethereum uses a memory hard hash function Ethash and targets GPUs as the primary mining equipment. It helps prevent mining centralization akin to Bitcoin’s and throttles CPU mining (botnets or cloud VM instances can be rented for a short time to perform an attack).

Table 1 compares some properties of Bitcoin and Ethereum. Note that the practical requirements regarding the disk space for an Ethereum node can be greatly reduced due to the explicit storage of account balances and data as opposed to Bitcoin’s UTXO [Dom17].

Table 1. Bitcoin and Ethereum, September 2017 [Eth17d, Bit17c, Eth17e, Bit17b, Coi17a]

2.4 Smart Contract Programming

EVM bytecode is a low-level Turing complete stack-based language operating on 256-bit words designed to be simple compared to general purpose VMs like JVM, execute deterministically, and natively support cryptographic primitives [But17b]. Developers usually write contracts in high-level languages targeting EVM, the most popular one being Solidity [Sol17] – a statically typed language with a Javascript-like syntax. Others include Serpent [Ser17] (deprecated in 2017 [Cas17]) and LLL [Ell17] (Python- and Lisp-like syntax respectively).

figure a

2.5 Applications

Among many potential applications of smart contracts [McA17], crowdfunding is arguably the first widely successful one. The first wide-scale Ethereum-based crowdfunding project was a decentralized investment fund called The DAO, launched on 30 April 2016Footnote 4. In 2017, the amount of money collected during so-called initial coin offerings (ICO) skyrocketed, reaching $1.8 bn [Coi17b] and surpassing early stage venture capital funding [Sun17]. ICO is usually based around a token – a smart contract that maintains a list of users’ balances and allows them to transfer tokens or buy and sell them for ether. Tokens are usually implemented with respect to the API defined in the ERC20 standard [Vog17]. The ICO organizers often promise that the tokens will be required to use the to-be developed product or service. Prominent Ethereum applications include decentralized file storage [Fil17, Sia17, Sto17] and computation [Gol17, Son17], name systems [ENS17], and prediction markets [Aug17, Gno17].

3 Open Problems

3.1 Core Protocol

Cryptographic Primitives. Ethereum uses ECDSA for signaturesFootnote 5, Keccack256 for generating unique identifiersFootnote 6, and Ethash [Eth17a] for proof-of-work. Based on Dagger [But13] and Hashimoto [Dry14], Ethash is a memory intensive, GPU-friendly and ASIC-resistant hash functionFootnote 7.

The algorithm is composed of four steps. In the first step, a seed is created from the blockchain by hashing the headers of each block together with the current epoch using Keccak. An epoch consists of 30 thousand blocks. In the second step, a 16 MB pseudorandom cache is generated from the seed using a memory-hard hash function. In the third step, done once per epoch, a linearly growing dataset (approximately 2 GB in 2017 [DAG17]) consisting of 64 byte elements is generated from the cache using a non-cryptographic hash function Fowler-Noll-Vo [Nol17]. In the fourth step, the dataset, a header, and a nonce are repeatedly hashed until the result satisfies the difficulty target.

Both Dagger and Hashimoto, in contrast to standardization attempts like the SHA-3 competition [SHA17] or the Password hashing competition [PHC15], were announced shortly before the Ethereum launch and did not undergo significant cryptanalysis in the academic community. The Ethash design rationale [Eth17b] lacks details on why established and well-tested memory-hard hash functions do not serve the purpose.  [Ler14] claims that an earlier version of Dagger (as of 2014) was flawed. Rigorous cryptanalysis of Ethereum’s underlying cryptographic primitives is required to guarantee its long-term security.

Consensus Mechanism. Though some argue that PoW is the only viable blockchain consensus mechanism [And14, Szt15], Ethereum is planning to switch from proof-of-work to proof-of-stake (PoS) [Her17]. As of September 2017, the first step of a two-stage process is due October 2017, transitioning Ethereum to a hybrid PoW-PoS consensus mechanism. The second step will make Ethereum fully PoS. PoS aims to address the drawbacks of PoW:

  • energy consumption comparable to a mid-sized country as of 2017 [Dig17];

  • centralization risks: miners are incentivized to invest in specialized hardware, which pushes up the entry cost of participating and puts big miners at an advantage due to economies of scale;

  • game-theoretic attacks like selfish mining [ES13].

PoS can be described as “virtual mining”: a miner purchases coins instead of hardware and electricity. The consensus mechanism distributes power proportionally to the amount of coins miners hold (stake), not computing power (see [BGM16] for a review of cryptocurrencies without PoW). Known issues with naive PoS implementations include:

  • Nothing-at-stake. As producing new blocks incurs only a negligible cost, a rational PoS validator extends all known chains to get a reward regardless of which one wins. This opens the door to attacks that require far less than 51% of the stakeFootnote 8: the attacker’s chain wins if the attacker supports it exclusively, whereas other validators behave rationally and support all chains.

  • Randomly choosing validators. Using randomness from the blockchain itself (i.e., previous block hash) to determine the next validator is insecure, as it is determined by validators in previous rounds. A possible solution is to use verifiable secret sharing for randomness generation.

  • Transaction finality. In PoW, a block header which has a hash less than the target simultaneously represents the choice of the next validator and the very act of validating the block. PoS separates choosing the next validators and producing the block. A PoS validator may create its own chain, plug in a constant instead of a pseudo-random number generator (PRNG) output, and produce blocks despite owning an arbitrarily small stake.

    A rule of thumb in Bitcoin considers transactions older than six blocks final, as the chance of a minority attacker overtaking the main chain becomes negligible. By contrast, as PoS blocks cost nearly nothing to produce, an attacker can secretly create an alternative chain starting from the genesis block. To prevent this, a PoS blockchain must provide finality – i.e., guarantee that after a fixed number of blocks old transaction can not be reversedFootnote 9.

The central concept of the proposed Ethereum PoS algorithm Casper [But16a] is “consensus by bet”: validators bet on the future blockchain state [PoS16, But17c]. Casper addresses the nothing-at-stake problem by introducing validator punishments for incorrect behavior, e.g., extending multiple chains, in addition to rewards, which makes the game-theoretic analysis of the protocol more complex. Long range attacks are addressed with the concept of finality [But17a].

Recent PoS designs also include 2-hop blockchain [DFZ16], Algorand [Mic16], Ouroboros [KRDO16], SnowWhite [DPS16], Proof of luck [MHWK17]. Blockchain networks Ripple [SYB14] and Stellar [Maz14] use consensus mechanisms inspired by Byzantine fault tolerant consensus protocols like PBFT [CL02]. Developing an efficient, secure and incentive compatible PoS algorithm is an important task in blockchain research.

Scalability. Open blockchains deliberately sacrifice performance for what a smart contracts pioneer Nick Szabo describes as social scalability [Sza17] – “the ability of an institution [...] to overcome shortcomings in human minds [...] that limit who or how many can successfully participate”. Both Bitcoin and Ethereum have been facing scalability problems [Sil16, Bit17a]. Improving blockchain scalability while minimally sacrificing security is an important research direction. Blockchain scalability can be defined as two goals: increasing transaction throughput and decreasing the requirements on bandwidth, storage, and processing power for nodes (thus preserving decentralization).

The first goal can be addressed by payment channel networks and sharding. A bidirectional payment channel is a protocol that lets users exchange signed transactions before publishing of them on-chain as settlement. A network of payment channels is a protocol that finds a sequence of payment channels across the network, a mechanism similar to the IP packet routing [McC15]. Payment channel networks for Bitcoin [Lig16] and Ethereum [Rai17] are in development.

In open blockchains, every node is usually required to process every transaction. This provides strong security, but severely limits scalability. Sharding [GvRS16, LNZ+16] might alleviate this problem by spreading transactions across groups of nodes (shards), which should be large enough to provide a sufficient level of security and a significantly better throughput [Sha16].

The second goal can be addressed by skipping the validation of old blocks [Jun17] or by additionally providing new nodes with full snapshots of a previous state [Par17].

Privacy. Most open blockchainsFootnote 10, including Ethereum, guarantee integrity and availability, but provide little to no privacy. All transactions are broadcast in plaintext and can be intercepted (or later obtained from the blockchain) and analyzed. Deanonymization of blockchain transactions is an active business area with start-ups (e.g., [Cha17b]) offering blockchain analysis tools, which is in line with government demands of KYC/AML compliance for financial services.

A common but only partially efficient privacy preserving practice in Bitcoin, which takes advantage of the UTXO structure of its state, is to use a new address for every transaction. This technique is not applicable in Ethereum, because it uses addresses for authentication and explicitly maps them to accounts states. For instance, if a user purchases tokens using a particular address, they have to use the same address to redeem them.

An additional privacy challenge comes from the requirement to hide business logic behind smart contract code. Though Ethereum only stores bytecode, users are reluctant to trust contracts without published source code. Moreover, bytecode analysisFootnote 11 tools are already available [NPS+17, Sui17]. Possible research directions in the privacy domain include privacy preserving smart contracts with zero-knowledge proofs [KMS+15] (support for zero-knowledge proofs in Ethereum was first tested in September 2017 [O’L17]), mixing, computations on encrypted data, and code obfuscation.

3.2 Smart Contract Programming

Programming Languages. Security is of paramount importance in smart contract programming [ABC17, DAK+15]. Contrary to traditional software, smart contracts can not be patched, which brings new challenges to blockchain programming [PPMT17]. Multiple approaches exist to contract programming [STM16]. Areas of research in this domain include systematizing good and bad programming practices [Con16, CLLZ17], designing general-purpose [Hir17a, But17d, PE16] as well as domain-specific [BKT17, EMEHR17] smart contract programming languages, and developing tools for automated security analysis [LCO+16, Sec17] and formal verification [BDLF+16] of smart contract source code, EVM bytecode, and the EVM itself [Hir17b].

Secure Contract Programming. An important challenge is to describe smart contracts’ execution model (possibly drawing parallels from concurrent programming on a multi-threaded processor [SH17]) and to develop a usable and formally verifiable high-level language reflecting this model. Some argue that Solidity inclines programmers towards unsafe development practices [ydt16]. Typical vulnerabilities and issues in Solidity might include:

  1. 1.

    Re-entrancy. Contracts can call each other. Malicious external contracts can call the caller back. If the victim contract does its internal bookkeeping after returning from an external call, its integrity can be compromisedFootnote 12.

  2. 2.

    Miner’s influence. Miners can to some extend influence execution (front-running, censorship, or altering environmental variables, e.g., timestamp).

  3. 3.

    Out-of-gas exceptions. Computation in Ethereum is many orders of magnitude more expensive than with centrally managed cloud computing services. Developers who do not take it into account may implement functions that require too much gas to fit in the block gas limit and thus always fail.

Trusted Data Sources. Many smart contract applications (financial derivatives, insurance, prediction markets) depend on real-world data. Ethereum is isolated from the broader Internet to guarantee consistent execution across nodes. A popular approach to providing data to contracts in a trust-minimizing way is an oracle – a specialized data provider, possibly with a dedicated cryptographic protocol to guarantee integrity [Ora17]. A recent development is TownCrier – an oracle built with trusted hardware [ZCC+16].

3.3 Higher Level Issues

Governance. In June 2016, a massive Ethereum-based crowdfunding project – The DAO – ended in a disaster: an unknown hacker exploited a bug in the smart contract and obtained around $50 million out of $150 million collected [Sir16]. Despite the fact that the Ethereum protocol correctly executed the smart contract code, the Ethereum developers implemented a hard fork that allowed stakeholders to withdraw their deposits. This event raised concerns about Ethereum’s governance, as the fork violated the premise of decentralized applications running “exactly as programmed” and lead to the creation of Ethereum Classic [Eth17c]. Governance mechanisms should provide certainty over how updates (potentially breaking compatibility) are introduced.

Though the gas price in ether is determined by the market, the relative gas costs of EVM bytecodes are constant. In September 2016, an attacker exploited a weakness in gas pricing and organized a DoS attack on the network, taking advantage of the fact that certain operations were under-priced [But16b]. The problem was ultimately fixed with a hard fork. Research is needed to propose more flexible mechanisms for determining relative prices of EVM operations.

Incentives. Open blockchains rely on the participants’ rationality [CXS+17] and must maintain incentive compatibility, so that rational behavior leads to the overall benefit for the network [LTKS15]. This introduces a new field of study dubbed cryptoeconomics – the study of incentives in cryptographic systems. The trustless nature of smart contracts might be used for benign (managing mining pools [LVTS17]) as well as for malicious (providing automatic rewards for attacking mining pools [VTL17]) purposes. Rigorous research should guarantee the proper functioning of the blockchain networks and applications based on a definition of rational behavior.

Usability. Considering the influx of new people into the blockchain space, usable yet secure lightweight blockchain software is needed. From the human-computer interaction (HCI) perspective, a challenging task would be to help users grasp the smart contracts fundamentals without going into technicalities. Research shows that cryptographically sound systems may fail to gain traction due to usability issues [RAZS15]. HCI research is needed to make blockchains and smart contracts usable by general public.

Ethical and Legal Issues. Information security researchers usually adhere to the “responsible disclosure” policy: they report a bug privately to the vendor and give developers time to fix it before publishing the information in the open. Though some oppose this practice [Sch07], it is assumed to decrease the probability of an attack on the live system (unless the attackers discover the same bug independently before a patch is applied). Ethereum introduces a new dimension to the responsible disclosure debate, as smart contracts can not be patched. It is unclear whether it is ethical to fully disclose a vulnerability discovered in a smart contract, if developers can not fix it anywayFootnote 13.

A whole separate range of topics, which is outside the scope of this paper, is how (and if at all) smart contracts fit into existing legal frameworks. For instance, BitLicense [ofs15] – a controversial [Act15] piece of regulation that came into force in New York in 2015 – prompted many cryptocurrency businesses to withdraw their services from the residents of this US state [Rob15]. In July 2017, the US Securities and Exchange Commission stated that issuers of digital assets may be subject to requirements of the US law [SC17].

4 Conclusion

Ethereum is a fascinating research area at the intersection of multiple fields: cryptography and distributed systems, programming languages and formal verification, economics and game theory, human-computer interaction, finance and law. The promise of smart contracts is not limited to making existing processes more efficient by putting parts of their logic onto a very inefficient, yet very secure decentralized network. This new way of handling value without a trusted third party opens up whole new classes of previously impossible use cases. Thorough research is needed to realize this vision.