Keywords

1 Introduction

Millions of Ethereum smart contracts operate hundreds of billions of dollars worth of assets. ERC20 fungible token is the most popular type of smart contract in Ethereum, often compared to decentralized bank account. Ethereum has two type of accounts: externally owned accounts (EOAs) and smart contracts. An EOA has an associated private key and can deploy smart contracts, but cannot execute custom code. On the other hand, a smart contract can execute custom code, but it does not have any associated private key for determining its owner. The deploying EOA of the contract does not automatically own this smart contract, unless this functionality is manually implemented by the contract developer. Moreover, any functionality related to ownership, role-based access, or other special permissions must be manually implemented by the developer; otherwise, the contract becomes orphaned at the moment it is deployed.

Many smart contracts use routines from the OpenZeppelin Contracts [3] library for implementing ownership and role-based access in the smart contracts. A recent analysis by Zhou et al. [17] shows that at least 2.1 million Ethereum smart contracts, out of 5.8 million total, use the onlyOwner modifier from the OpenZeppelin Contracts library, which allows only a certain user (i.e., owner) to call the functions of the smart contract implemented with this modifier. Figure 1 shows a Venn diagram of the relationships between different subsets of Ethereum smart contracts from the perspective of this research. Specifically, we subdivide all smart contracts into two major categories: administrated contracts, and effectively ungoverned smart contracts, particularly emphasizing that not all contracts that have an owner are necessarily administrated, as the ownership may be purely symbolic sometimes or only allows harmless operations. The administrated smart contracts are characterized by two major properties: a) there is at least one Ethereum account whose owner possesses a unique privileged status; b) the privileged status allows the user to perform actions that may affect other users of the smart contract. These two properties constitute the difference between the administrated and ownable smart contracts: the ownable smart contract must only meet the first property; however, there are smart contracts that have an owner, but this owner has no power to disrupt the operation of the smart contract.Footnote 1 We further refer to non-administrated smart contracts as effectively ungoverned, the set that includes the ownable non-administrated contracts, and many of them are ERC20 tokens.Footnote 2 In this work, however, we zero in on the administrated ERC20 tokens, and our goal is to introduce a novel subset of these tokens—safely administrated ERC20 tokens.

The obvious popularity of owned smart contracts and ERC20 tokens leads us to the following research question: how many unique administrated ERC20 tokens are deployed on Ethereum? To answer this question, we develop an extractor of 9 syntactic features characterizing administrated ERC20 tokens. We then gather 1,173,271 open source smart contracts written in Solidity programming language, and by removing the duplicates, we reduce the dataset to 84,062 unique, independent, and identically distributed (i.i.d.) smart contracts. We further select 385 random contracts for manual labeling in order to choose the most accurate classifier among several candidates. Finally, we use the 9 features and the chosen classifier to determine the approximate percentage of administrated ERC20 contracts deployed on the Ethereum Mainnet blockchain. Our evaluation shows that nearly 58% of all the smart contracts and almost 90% of all ERC20 tokens are administrated ERC20 tokens. To the best of our knowledge, we are the first to conduct the Ethereum-wide evaluation of administrated ERC20 tokens and quantify their ubiquity.

To mitigate the potential adverse effects of administrated ERC20 tokens in a low-regulated economic environment, we propose SafelyAdministered—a Solidity library that allows developers of ERC20 tokens to implement most common administrated patterns in a safe and responsible way, thereby increasing the trust towards their products without sacrificing the need to retain control over certain operations (e.g., upgrade).

Fig. 1.
figure 1

Venn diagram of different types of Ethereum smart contracts.

In summary, we make the following contributions:

  • We analyze the class of administrated ERC20 tokens and show that these contracts are more owner-controlled and less safe than the services they try to disrupt, such as banks and centralized online payment systems.

  • We develop a binary classifier for identification of administrated ERC20 tokens, and conduct extensive data analysis, which reveals that nearly 9 out of 10 ERC20 tokens on Ethereum are administrated, and thereby unsafe to engage with even under the assumption of trust towards their owners.

  • We design and implement SafelyAdministrated—a Solidity abstract class that safeguards users of administrated ERC20 tokens from adversarial attacks or frivolous behavior of the tokens’ owners.

2 Background

Smart Contracts and EVM. A smart contract is a program deployed on a blockchain and executed by the blockchain’s virtual machine (VM). A smart contract consists of a set of functions that can be called through blockchain transactions. Most smart contracts are written in a high-level special-purpose programming language, such as Solidity or Vyper, and compiled into the bytecode for deployment and execution on a blockchain VM. The Ethereum Virtual Machine (EVM) is the blockchain VM for executing Ethereum smart contracts.

Externally Owned Account. Ethereum blockchain has two types of accounts: smart contract account and Externally Owned Account (EOA). Both EOAs and smart contract accounts can be referenced by their 160-bit public addresses. EOAs can be used to call the functions of smart contracts via signed transactions.

Solidity. Solidity is the most popular programming language for EVM smart contract development, which syntax is similar to JavaScript and C++. The source code of a smart contract written in Solidity needs to be compiled into bytecode before being deployed on EVM. All smart contracts analyzed in this study are written in Solidity.

ERC20 Tokens. ERC20 is the most popular standard for implementing fungible tokensFootnote 3 in Ethereum smart contracts. Some of the most traded alternative cryptocurrencies (altcoins) are ERC20-compatible smart contracts deployed on Ethereum Mainnet, such as ChainLink and BinanceCoin. The ERC20 standard defines an interface with 6 mandatory functions, 2 mandatory events, and 3 optional properties that a smart contract should implement in order to become an ERC20 token to interact with ERC20-compliant clients.Footnote 4

OpenZeppelin Contracts. OpenZeppelin Contracts is a library of smart contracts that have been extensively tested for adherence to best security practices. These smart contracts are considered to be the de-facto standardized implementations of popular smart contract code patterns [4]. The OpenZeppelin project provides a rich code base for ERC20 token developers [2]. Most ERC20 tokens, as well as the administrated patterns in these tokens, are implemented by inheriting routines from the OpenZeppelin Contracts library.

3 Administrated ERC20 Patterns

In this section, we elaborate upon five general re-centralization patterns that we observe in Ethereum smart contracts.Footnote 5

3.1 Self-destruction

EVM opcode SELFDESTRUCTFootnote 6 allows to remove a smart contract from the blockchain. To provide further incentive for owners to remove unused contracts, the address supplied as an argument of SELFDESTRUCT call receives the entire Ether cryptocurrency balance of the smart contract. Solidity uses the built-in function selfdestruct() to initiate the removal of the smart contract—if this functionality is implemented, the administrator (or an attacker impersonating the administrator) can trigger it at any moment, effectively destroying all users’ assets with a single transaction. Figure 2 shows a real-world example of such a pattern.

Fig. 2.
figure 2

A snippet of an administrated self-destruction pattern in the contract deployed at 0xbF3d14995D4A4A719A3B9101DE60baa47De60F39.

3.2 Deprecation

With the exception of self-destruction, the source code of an Ethereum smart contract is immutable, which impedes the ability for developers to deliver new features or fix existing bugs. To address this limitation, some developers of smart contracts implement a bypass scheme, in which a contract can be declared as deprecated by the owner, resulting in the redirection of the users’ transactions towards functions of a new contract. The danger of this scheme stems from the fact that it grants the owner of the contract an ability to replace the code of some critical functions with arbitrary ones. Figure 3 shows a real-world example of the deprecation pattern.

Fig. 3.
figure 3

A snippet of an administrated deprecation pattern in the TetherUSD smart contract deployed at 0xdAC17F958D2ee523a2206206994597C13D831ec7, which allows the owner to effectively inject the code of the contract with an arbitrary one.

3.3 Change of Address

Another administration strategy is the ability for the owner of a smart contract to change certain critical addresses, such as recipients of fees or accounts associated with certain roles. As shown in our previous study [12], a replacement of a public address in a smart contract can lead to an acquisition of the funds by the owner of the contract. Figure 4 demonstrates such an address changing pattern.

Fig. 4.
figure 4

A snippet of a change-of-address pattern in the smart contract deployed at 0x350BDC46d931712d83ef989725Ba4904C487F360. The exploitation of such pattern has been demonstrated in previous research.

3.4 Change of Parameters

Another administration pattern is characterized by the change of certain parameters by the owner, which may affect the ability by a user of the contract to perform certain operations. For example, if the owner is allowed to arbitrarily change the amount of withdrawal fees, this parameter might be set to a very large value (e.g., 99%), effectively preventing withdrawal of funds by the user. Another example of this pattern is shown in Fig. 5, where the owner of the contract exercises an unbounded power to manage administrators of the smart contract.

Fig. 5.
figure 5

A snippet of a change-parameter pattern in the smart contract deployed at 0x18c210013ea6cbe99b2dacdc9cfcb6e07458f0ca.

3.5 Minting and Burning

An increase of a token supply of an ERC20 contract is called token minting, and the reduction of supply of tokens is called burning. Since the entire supply of tokens is partitioned between owners in a way that there are no balances belonging to nobody, minting a token means to increase someone’s balance, and burning a token means to reduce someone’s balance. Although most tokens are minted or burned as a result of a certain event, such as token creation, token swap, crowdsale, or exchange into Ether balance, some contracts allow privileged users to arbitrarily mint or burn tokens, which is a dangerous action that even highly centralized commercial banks normally cannot do. Figure 6 demonstrates an example of the minting and burning pattern implemented in a deployed Ethereum smart contract.

Fig. 6.
figure 6

A snippet of a minting and burning patterns in the smart contract deployed at 0x82bfdd53dd95efa2c3e92543f28d46c566bf4b8a.

4 Administrated Tokens in the Wild

In this section, we use a pattern recognition method to search for administrated ERC20 tokens in the Ethereum Mainnet network, as shown in Fig. 7. We start the process with preprocessing all the input samples by removing comments and extracting source codes from multi-part JSON files.Footnote 7 Then we randomly select 385 samples from 84,062 unique source code files and manually assign (label) them into two classes: a) administrated ERC20 tokens, and b) others. After that, we extract 385 9-dimensional feature vectors corresponding to the labeled samples, with the assumption that all the samples are identical and independently distributed (i.i.d). Then we use 385 labeled samples and the corresponding feature vectors to evaluate the performance of 9 different classifiers using the K-fold method (with \(k=5\)). Next, we choose the best performing classifier, i.e., the one that demonstrated the higher accuracy during the evaluation stage (i.e., SVC). After that, we extract 84,062 feature vectors corresponding to the entire data set. Next, we train the SVC classifier with the 385 labelled samples. Due to the i.i.d. assumption, we can now classify all the samples using the trained SVC model. Finally, we gather the output and analyze the results.

Fig. 7.
figure 7

General worflow of the analysis of administrated ERC20 tokens. The workflow includes 9 major steps. (1): Pre-process input samples to remove comments and parse multi-part JSON files. (2): Pick 385 samples from 84,062 unique source code files and manually assign them into two classes: a) administrated ERC20 tokens, and b) others. (3): Extract 385 feature vectors corresponding to the labeled samples. (4): Use 385 labeled samples and the corresponding feature vectors to evaluate the performance of 9 different classifiers using the K-fold methods (with \(k=5\)). (5): Choose the best performing classifier on the 385 labeled samples with the given 9 features. (6): Extract 84,062 feature vectors corresponding to the entire data set. (7): Train the classifier with the 385 labelled samples. (8): Classify all the samples using the trained classifier. (9): Analyze and report the results.

4.1 Data Set

First, we gather 1,173,271 open-source smart contracts from Etherscan,Footnote 8 and by removing duplicates (using fdupesFootnote 9), reduce the size of the database to 84,062 distinct smart contracts. Then, we remove all comments from the data points (i.e., source code files), and select 385 random contracts for manual labelling using the following formula:

$$\begin{aligned} n = \frac{N}{1+N \cdot (1-c)^2}. \end{aligned}$$
(1)

Equation 1 is the Slovin’s formula [6], which statistically determines a required representative sample size for a given data size and desired confidence level. N is the original population of smart contracts, i.e., \(N=84,062\), and n is the sample size that we choose to represent the population. c is the confidence level that represents the certainty that the sample size represents the population. We set the confidence level as \(95\%\) (precisely, \(94.915\%\)), leading to sample size \(n=385\), which can be split into two partitions of 77 and 308 samples for k-fold evaluation with \(k=5\).

4.2 ERC20 Administration Features

Our knowledge of the administration features in ERC20 tokens stems from our experience of manual analysis of around 3,800 source codes of Ethereum smart contracts. The experience of manual analysis of thousands of smart contracts, which has taken more than 140 person/h, allows us to recognize all existing administration patterns. As a result, we have developed 9 syntactic signatures which are intuitively well-separated and independent because we have observed various combinations of these signatures in administrated smart contracts. This led us to designing 9 syntactic features, denoted \(f_1 \ldots f_9\) that produce one of two binary values: 1—the corresponding syntactic signature is present; 0—the signature is absent. Below is the brief description of the syntactic signatures that the 9 features correspond to.

\(\boldsymbol{f_1}\): ERC20 Interface Implementation. The goal of this research is to identify administrated ERC20 tokens. In order to separate ERC20 tokens from other types of smart contracts, feature \(f_1\) extractor detects the simultaneous presence of syntactic identifiers corresponding to the eight mandatory items of the ERC20 interface, as described in the EIP-20 standard.

\(\boldsymbol{f_2}\): Administrated Self-destruction Signature. If the owner of a smart contract implements a self-destruction procedure, they may remove the contract from the Ethereum ecosystem with a single transaction, simultaneously acquiring all the Ether balance of the contract. Feature \(f_2\) detects such a signature, both in old versions of Solidity and the modern ones (the exact procedure differs for different versions of the language).

\(\boldsymbol{f_3}\): Pausable Functionality Signature. The owner of a smart contract can inhibit any operations with the contract at their will for indefinite period of time. Although pausing a smart contract does not allow to directly acquire Ether or token balances, it may have dire consequences if the owner’s private key is stolen by an attacker or lost while the token is paused. Feature \(f_3\) is intended to identify signatures of such pausable tokens.

\(\boldsymbol{f_4}\): Contract Deprecation Signature. Since Ethereum smart contracts are non-modifiable, the only means of upgrading the contract is to deprecate the existing contract and refer the users to the new one using inter-contract calls (ICCs). Unfortunately, this procedure allows the owner of the smart contract to effectively introduce any arbitrary code. Feature \(f_4\) extracts the signatures of contract deprecation functionality, which is one of the most dangerous patterns in administrated ERC20 tokens.

\(\boldsymbol{f_5}\): Minting and Burning Signatures. The ability for a privileged user to arbitrary create and remove tokens, known as minting and burning respectively, is a major concern associated with administrated ERC20 tokens. Feature \(f_5\) represents the signature of a minting and/or burning in the smart contract, which execution can only be triggered by a privileged user (administrator).

\(\boldsymbol{f_6}\): Role-Restricted Transfers and Withdrawals. Another signature of an administrated ERC20 token is the ability for a privileged user to perform arbitrary token or Ether cryptocurrency transfers and withdrawals of the funds that do not belong to these users. Feature \(f_6\) corresponds to the syntactic signature related to such transfer and withdrawal functionality under a privileged access.

\(\boldsymbol{f_7}\): Function-Disabling Modifiers. Some function modifiers do not directly check for the identity of privileged users; instead, they use the parameters previously changed by an administrator to decide whether the function needs to be executed. Feature \(f_7\) is related to such modifiers that are capable of disabling the execution of a function based on a parameter adjustable by the contract’s privileged user.

\(\boldsymbol{f_8}\): Direct Checks of a Sender Address. Although modifiers are popular means of granting privileged access to certain functions of a smart contract, some administrated contracts use direct checks of the msg.sender or msg.origin values. Feature \(f_8\) targets the direct (i.e., bypassing Solidity modifiers) transaction identity checks, which predominantly make sense within the administrated smart contracts context.

\(\boldsymbol{f_9}\): Freezing, Halting, or Killing Methods. A list of some specific frequently occurring function names, such as “freeze”, “halt”, and “kill” empirically strongly correlate with the administrated property of ERC20 tokens. Feature \(f_9\) detects the presence of such frequently used functions that almost always indicate an administration pattern.

4.3 Classifier Evaluation and Model Selection

We use 385 manually labeled samples to evaluate the performance of 9 popular classifiers using the K-fold method with \(k=5\). Table 1 summarises the classification models used for evaluation and the accuracy of each of these models using the K-fold evaluation method with 385 labeled samples. The evaluation demonstrates that 8 out of 9 classifiers stay within the \(95\% \ldots 97\%\) accuracy range, except for the Gaussian Naive Bayes classifier, which performance is slightly above \(61\%\).

Table 1. Tested classifiers.

4.4 Implementation and Evaluation of the Analysis Workflow

We implement the extractors of all the 9 syntactic features using Python 3.8.5 and re regular expressions library. We implement the K-fold evaluation and dataset analysis using Python 3.8.5 with sckit-learn 0.24.1 and numpy 1.20.0 libraries. We randomly selected 385 smart contracts from the i.i.d. set of 84,062 and manually labeled them by human comprehension of the semantics of each of the smart contracts, which took approximately 40 person/h of total effort.

4.5 Results

Out of 84,062 evaluated smart contracts, 54,626 have been identified as ERC20 tokens, which is around 64.6%. As many as 39,034 contracts have been classified as administrated ERC20 tokens (by counting the occurrences of \(f_1 = 1\)), which is 57.96% of all the evaluated smart contracts, and 89.76% of all ERC20 tokens. Subsequently, only about 10% of all ERC20 tokens are non-administrated, i.e., exhibit full decentralization and permissionless design, while the vast majority of the tokens are tightly controlled by their owners and other privileged users, effectively overriding the decentralization capability of the hosting blockchain. Figure 8 shows the summary of the results of our analysis.

Fig. 8.
figure 8

Results of processing of 84,602 unique source codes of Ethereum smart contracts using the SVC classifier and the 9 developed syntactic features.

5 SafelyAdministrated Library

Existing administrated ERC20 tokens are generally unsafe because they are loosely regulated and their functionality often hinges upon a single account’s private key, which can be abused by its owner or stolen by an adversary. To mitigate such an unsafe arrangement without denouncing the idea of administration or boycotting the administrated tokens, we propose a novel solution for making these smart contracts safe. As shown in Sect. 4, most ERC20 tokens are administrated, and therefore potentially unsafe. However, due to their ubiquity, it would be naive to urge users to boycott 9 out of 10 of currently deployed ERC20 tokens. In this work, we propose a feasible “evolutionary” fix to the existing problem. Specifically, we realize that administrated patterns can be used by token owners without jeopardizing the safety of the contract and requiring trust from the users. For that, the current primitive administrated routines can be re-implemented to incorporate three novel concepts: deferred maintenance, board of trustees, and safe pausing. The details of these three approaches are explained below.

5.1 Deferred Maintenance

The owners of existing administrated ERC20 tokens have the ability to call the managerial functions without any announcement. In order to prevent unannounced actions, SafelyAdministrated library implements a mechanism of deferred maintenance, which allows to announce the maintenance action to the users and enact it only after a certain delay. For example, if the contract is about to be upgraded, the users of the contract may be notified and decide whether they agree on the upgrade or not. If the users disagree with the upgrade, they may safely quit (i.e., sell or transfer their tokens) before the action takes into effect.

5.2 Contract Board of Trustees

In most administrated smart contracts, the privileged user (administrator) has a sole power to perform critical actions upon the smart contract, which incurs the need of trust from the users of the contract. Moreover, if the private key of the smart contract’s administrator is stolen, the attacker becomes the administrator of the contract. Essentially, the safety of the contract often hinges on a single private key belonging to a single person, which is the major concern about the administrated smart contracts. The contract board of trustees allows to split the administrative power among multiple private keys possessed by different parties, such that the maintenance actions are only possible through a voting consensus with a pre-determined threshold.

5.3 Safe Pause

The ability to pause the execution of transactions in a smart contract is not necessarily a whimsical action of the contract administrator. For example, this may be a necessary action upon discovery of a zero-day vulnerability—by pausing transactions, the administrator of the contract may prevent an exploitation of such vulnerability. However, indefinite pause may also be abused by the contract administrator, or it can be triggered by an adversary who stole a private key of the administrator’s account. To prevent the adverse effects of the pause functionality, in this work we introduce a safe pause routine, which allows to freeze all transactions in the smart contract with a forced un-freeze after a certain deadline. Moreover, once the contract is un-frozen, it cannot be frozen again for some time. This way, any of the trustees of the contract can enact an emergency pause, but no one is able to keep the contract paused indefinitely.

Table 2. Inheritable interfaces of SafelyAdministrated abstract class.

5.4 Implementation

We implement SafelyAdministrated as an abstract Solidity class, which includes 6 functions, 3 modifiers, and 5 events, summarized in Table 2. We implemented a testing ERC20 token that inherits the SafelyAdministrated contract, compiled it using Solc 0.8.1, and thoroughly tested its functionality to confirm that SafelyAdministrated allows an ERC20 token to be administrated in a safe manner.

5.5 Limitation

One limitation of SafelyAdministrated is that the trustee whose vote attains the voting threshold effectively pays fees for the execution of the maintenance transaction, while other trustees pay only for execution of recording of their vote. Although we assume that this unfairness is unlikely to be important in most cases, we leave the implementation of fee reimbursement for future work.

6 Related Work

Currently, the major concern about the safety of smart contracts comes from security vulnerabilities in them. Researchers have proposed automated tools for detecting known smart contract vulnerabilities. Some notable security scanners for Ethereum include Oyente [13], Mythril [1], and Vandal [5]. Tsankov et al. [16] propose Securify, a tool that analyzes the bytecode of Ethereum smart contracts to detect patterns associated with known security vulnerabilities. Torres et al. [15] present a taxonomy of smart contract honeypots, which are deceptive smart contracts targeting users who attempt to exploit known vulnerabilities of smart contracts. Recently, Chen et al. propose TokenScope [7], an automated tool, which detects the discrepancies between syntax and semantics in the functions of ERC20 tokens. In this work, we reach beyond the security vulnerabilities and explore a generally overlooked safety issue in smart contracts, i.e., administrated patterns that allow owners of ERC20 tokens (or adversaries who steal the owner’s account private key) to cause a mass damage to the token owners.

The influence of private actors on blockchain resources has been a subject of concern for many years. Raman et al. [14] conduct a case study of decentralized web applications and identify a prevalence of re-centralization of such apps. Griffin et al. [11] discover that TetherUSD ERC20 token has been used for manipulating the price of cryptocurrencies. In this work, we expand the discussion about the re-centralization and private manipulation of the services that are intended to be centralized to embrace the realm of ERC20 tokens.

The public trust towards administrated ERC20 tokens may be indicative of a well-studied irrational or semi-rational human behavior. In our previous research [12], we explore social engineering attacks in Ethereum smart contracts by demonstrating how visual cognitive bias and confirmation bias lead a user into engaging with a malicious smart contract. Fenu et al. [9] demonstrate the irrational behavior exhibited by many people when engaging with high-risk smart contracts involved in initial coin offerings (ICOs). In this work, we scrutinize a new facet of semi-rational human behavior: the false assumption that most smart contracts are decentralized, permissionless, and ungoverned just because they are deployed on a blockchain that holds these properties.

Previous studies proposed smart contract-level multi-signature voting schemes. ÆGIS [10] implements a voting-based mechanism, in which trusted experts vote for a security patch. Unfortunately, the voting mechanism in ÆGIS has been design for different context and cannot be applied, even with modifications, to the trustee-based contract maintenance scenarios. Christodoulou [8] introduces a decentralized voting scheme similar to the Board of Trustees used in this work. However, all the above solutions are domain-specific, and cannot be directly used for general cases, as we see it in the SafelyAdministrated library.

7 Conclusion

Unlike banks and other financial institutions, smart contracts are weakly regulated or unregulated at all. Simultaneously, an ERC20 token is often owned by a single account, the security of which hinges on a single private key. At the same time, we observe that market capitalization of some tokens, such as USDT and BNB, reaches billions of dollars, which means that if the administrator’s private key is stolen or abused, all the funds from all users in the contract might be stolen immediately. ERC20 fungible tokens have been a hope for the next-generation tokenized economy. However, in this research we demonstrate that approximately 9 out of 10 ERC20 tokens are administrated assets that are generally less secure than traditional financial institutions and accounts. Instead of stigmatizing the widespread administration of the tokens, we deliver a solution for the honest token owners to achieve their goals in a way that is safe for both them and the users—through implementing the novel contract ownership mechanism, which effectively prevents a single point of security failure and enforces prior notice of maintenance. At the time of writing, there is no affiliation or sponsorship, current or arranged, between the authors of this work and any banks, online payment systems, and smart contract developers mentioned or implied in this research.