Keywords

1 Introduction

1.1 Blockchain and Ethereum

Blockchain. Public blockchain technology answers the requirement to register a series of events in a open, decentralised, available and immutable platform. Although the word blockchain does not appear in it, the seminal paper about Bitcoin by Satoshi Nakamoto in 2008 [1] discloses the concept behind it. “The longest chain wins” and, consequently, “the largest devoted computing power wins” summarises the functioning of Bitcoin.

Ethereum. Five years later, Vitalik Buterin invented and co-founded Ethereum. This public blockchain evolves Nakamoto’s original blockchain concept into a Turing-complete platform able to run decentralised applications by introducing smart contracts, i.e., code that runs on top of the blockchain [2]. The possibility to script any logic in a blockchain gave birth to a multitude of tokens, both fungible and non-fungible (NFTs). While Bitcoin bases its transactions on unspent transaction outputs (UTXO) model with scripts for locking and unlocking the outputs, Ethereum uses an account-based model with balances associated with each address which also allows the implementation of smart contracts [2].

ERC-20 Tokens. Ethereum Request for Comments 20 (ERC-20) is the Ethereum standard for fungible non-native tokens ([3]). Fungible refers to tokens which are identical and interchangeable between the same currency. ERC-20 provides an application programming interface (API) to transact with these tokens. It defines methods such as transfer(), balanceof() and approve(). The four tokens of our study are ERC-20 Ethereum tokens.

Network Participants: Humans and Code. In the Ethereum network transactions occur between addresses, each of which has an associated balance. While in Bitcoin transfers can occur from “1 to 1”, “n to 1”, “1 to n” and “n to n” senders and destinations addresses due to its use of the UTXO model, in Ethereum all transactions happen “1 to 1” between a sender address and a destination address due to its use of the account model. Each account has an address derived from a public key and belongs to one of the two types: i) externally owned accounts (EOA) which are controlled by users or, alternatively, by code running outside of blockchain, and smart contracts. Smart contracts expose functions that can be invoked by EOAs or other contracts, with the distinction that smart contracts cannot initiate transactions themselves - only EOA’s can initiate chain of smart contract executions [2, 4].

1.2 Transaction Networks in Blockchain Systems

Nodes and Edges: Addresses and Transactions. Complex network analysis studies the relations between systems composed of a high number of nodes, connected between them via edges [5]. A long list of authors has studied public blockchains’ networks via network science [6,7,8,9,10]. We use network analysis to understand the structure of the transaction networks of the four mentioned ERC-20 tokens. The nodes represent the addresses that intervene in these networks and the edges the value transfers between them.

Network Properties. Key network science properties that characterise a complex network are, among others, degree [11], density, and largest connected component. In addition to these, in this paper we analyze two more - preferential attachment and network dismantling. Preferential attachment relates to the way the network grows. Linear preferential attachment leads to a scale-free network that displays a power law behaviour. Network dismantling, the opposite to network percolation, provides insights on how the network endures the elimination of highly connected nodes [12,13,14]. As Ethereum is a public blockchain, we extract all transaction relevant data related to AMP, BAT, DAI and UNI from a fully synced Ethereum node. We build the four corresponding transaction networks and calculate their key network properties with a special focus on the smart contract addresses.

1.3 Four Tokens Used as DeFi Collateral

DeFi changes the paradigm in finance. It shifts financial activities such as lending and borrowing from a traditionally centralised approach to a blockchain-based distributed approach. The logic required to run these financial processes is implemented in smart contracts running predominantly on the Ethereum platform. We study the transaction networks of four types of Ethereum-based ERC-20 tokens that can be used as collateral in DeFi: A utility token (BAT), an algorithmic stablecoin (AMP), a multi-currency pegged algorithmic stablecoin (DAI) and a governance token (UNI).

Ampleforth (AMP): An algorithmic stablecoin pegged to the USD that bases its stability by adapting its supply to price changes without a centralised collateral. The protocol receives exchange-rate information from trusted oracles on USD prices and accordingly changes the number of tokens each user holds [15]. AMP was launched in June 2019 and has a market capitalisation of over USD 2B as of September 2021 which places it in the top 100 cryptocurrencies [16].

Basic Attention Token (BAT): A utility token aiming to improve efficiency in digital advertising via its integration with the Brave browser. Users are awarded BATs for paying attention to ads. BAT allows users to maintain control over quantity and type of the ads they consume while advertisers can achieve better user targeting and reduced fraud rates [17]. BAT had an initial coin offering (ICO) in May 2017 and as of September 2021 it has a market capitalisation of roughly USD 1.1B which places it in the top 100 cryptocurrencies [16].

Dai (DAI): A multi-currency pegged algorithmic stablecoin token [18] launched in 2017 which uses, as AMP, smart contracts on Ethereum network to keep its value as close as possible to US$. Users can deposit ETH as a collateral and obtain a loan in DAI, and the stability of DAI is achieved by controlling the type of accepted collateral, the collaterisation ratio and interest rates. In November 2019 DAI transitioned from a single-collateral model (ETH) to a multi-collateral model (ETH, BAT and USDC among other tokens), which we analyze in this paper. As of September 2021 DAI has a market capitalisation of USD 6.5B [16].

Uniswap (UNI): A decentralised finance protocol [19] to exchange ERC-20 tokens on the Ethereum network. Unlike traditional exchanges it does not have a central limit order book but rather a liquidity pool - pairs of tokens provided by users (liquidity providers) which other users can then buy and sell. This UNI governance token was launched on September 2020 [19]. It is currently ranked among the top 11 cryptocurrencies by market capitalisation, which amounts to almost USD 14B as of September 2021 [16].

2 Data Description

Table 1. Summary of the datasets curated for the four tokens used in this study. We extract all transactions (Tx) from the ETH blockchain. The last two columns show the Ethereum blocks containing the transactions used in this study for each token and their time span. Although DAI token launched in 2017 we collect transaction data only since its move towards a multi-collateral model in 2019.
Table 2. Spearman correlation \(\rho _{s}\) between in-degree \(k_{in}\) and out-degree \(k_{out}\) for each token. The relation is stronger for AMP and DAI than it is for BAT and UNI. Observing a highly irregular pattern for low in-degree nodes, we suspect that the correlation for nodes with higher degree could be stronger. Computing the Spearman correlation for \(k_{in}>100\) confirms this.
Table 3. Scale-free networks are characterised by a power law degree distribution \(p_{k} \sim k^{-\gamma } \). In the definition of Barabasi the exponent should \(2 \le \gamma \le 3\), as in [20, 21] This condition happens for only few cases, for BAT and DAI in \(k_{out}\). According to [22] in most of our cases we are in weak and weakest condition of scale free networks, where we are mostly following a power law distribution. \(x_{min}\) is the minimum x value where the fit starts.

We construct an aggregated transaction network \(G^{S}(t)\) represented with a single directed graph encompassing the full available history for each of the four Ethereum tokens. The nodes of the network represent addresses participating in transfers. Every edge of the network represent all the transfers that happen between the two involved addresses. We analyse more than 700k transactions (Tx) in AMP, 3M Tx in BAT, 8M Tx in DAI and 2M Tx in UNI, as displayed in Table 1.

$$\begin{aligned} G^{S}(t) = \left( \mathcal {V}^{S}(t), \mathcal {E}^{S}(t)\right) \text {for symbol } S \in \left\{ \hbox {AMP}, \hbox {BAT}, \hbox {DAI}, \hbox {UNI} \right\} \end{aligned}$$

The set of nodes \(\mathcal {V}^{S}(t)\) corresponds to the addresses that have been included in at least one transaction of symbol S since time t. The set of edges \(\mathcal {E}^{S}(t)\) consists of unweighted, directed edges between all pairs of addresses. In edge (transaction j) \((j_1,j_2)\), node \(j_1\) is the sender and \(j_2\) is the recipient (Tables 2 and 3).

2.1 Preferential Attachment

Preferential attachment is the network growth mechanism where the probability of forming a new link is proportional to the degree of the target node. In mathematical terms, we describe the probability \(\pi \) of forming a new link to an existing node j with in-degree \(k_{in,j}\) or from an existing node j with out-degree \(k_{out,j}\) in the following way:

figure a

where \(\alpha ^{in}> 0\). If \(\alpha ^{in}= 1\) the preferential attachment is linear. If \(\alpha ^{in}< 1\) it is sub-linear, and when \(\alpha ^{in}>1\) it is super-linear. When the probability of forming the new link is linear, then preferential attachment leads to a scale-free network. When the attachment is super-linear, very few nodes (hubs) tend to connect to all nodes of the network. These hubs are of crucial importance in the network. We further extend this model to the out-degree \(k_{out,j}\) for an existing node j to model the accruing and dynamic process of consolidation of out-degree as well in preferential attachment for directed networks.

When a new, directed edge is added to the network, we assume that the source node j is selected with a probability which is a function (solely) of its out-degree \(k^*_{out}\), i.e. \({\pi \left( k^*_{out}\right) }\), as we denote \(\pi (k_{in})\) the probability that a new link is created to any node with out-degree \(k^*\) (or in-degree as in the original model). Since this probability is a time-dependent, we use the rank function \({R(\alpha ; k^*_{in},t)}\), computed for each link addition to a node with in-degree \(k^*\) at each time t. Specifically:

$$\begin{aligned} R(\alpha ; k^*,t)= & {} \frac{ \sum _{k = 0}^{k^*-1}n(k, t) \, k^\alpha }{\sum _{k } n(k, t) \, k^\alpha } . \end{aligned}$$
(3)

Thus, the sum in the denominator runs for all nodes whose in-degree is lower than \(k^*_{in}\) or, in case of out-degree whose \(k^*_{out}\) is lower than 0. When a new edge is created, if the target or the source node is drawn with a probability for a given \(\alpha ^{in}_o\) or \(\alpha ^{out}_o\), that we can replace into Eq. 3.

To obtain the value of \(\alpha _o\), we measure the corresponding K-S (Kolmogorov-Smirnoff) goodness of fit, i.e., the difference between the empirical distribution function (ECDF) calculated with different exponents \({\alpha }\) and the theoretical linear CDF distribution. The value \(\alpha _o\) that minimises the distance to the uniform distribution is the best fit for the exponent, which determine the kind of preferential attachment in our Transaction Network. We sample 10% of all the edges while building the network and calculate K-S error between the empirical distribution and a theoretical one, in this case a pure power law, for a range of \(\alpha \in [0,2.5]\) to find an error-minimising \(\alpha \). We observe, consistently in all four tokens, that the minimum value of \(\alpha \) is achieved around 1.0 for the out-degree and around 1.1 for the in-degree. A value of \(\alpha > 1\) for the in-degree indicates a super-linear preferential attachment in the network, i.e., small number of nodes attract most of the connections in the network and will eventually form super-hubs. This is another indication of the rising centralisation in the network, caused by the presence of key smart contract and exchange nodes. For all the tokens we have super-linear preferential attachment AMP has \(\alpha ^{in}\) 1.05 (error 0.143), \(\alpha ^{out}\) 1.02 (error 0.174), BAT has \(\alpha ^{in}\) 1.15 (error 0.198), \(\alpha ^{out}\) 1.1 (error 0.226), DAI has \(\alpha ^{in}\) 1.1 (error 0.099), \(\alpha ^{out}\) 1.05 (error 0.126), UNI has \(\alpha ^{in}\) 1.05 (error 0.227), \(\alpha ^{out}\) 1.02 (error 0.257). An evolution in time with non cumulative time windows can be seen in Fig. 2.

3 Methods and Implementation

Figure 1 plots network density as a function of network size for all four tokens. Density scales inversely proportional to network size \(d \propto N^{-1}\). This shows that the number of edges grows linearly with the size of the network. New nodes add a limited number of edges. Transactions mostly reuse already existing edges: an indication of preferential attachment. This also indicates an increasing centralisation in the network as smart contract nodes and exchanges act effectively as hubs in the network through which most transactions are executed.

Fig. 1.
figure 1

Evolution of network density as a function of network size. As the network size grows the network does not densify but rather the number of edges scales as \(d \propto N^{-1}\), which is an evidence of a preferential attachment process.

Figure 2 shows the evolution of the best fit \(\alpha \) for preferential attachment over time in all four tokens for in-degree \(\alpha ^{in}\) and out-degree \(\alpha ^{out}\). We see that \(\alpha ^{in}\) stays consistently around 1.1 (equivalently, 1.0 for the \(\alpha ^{out}\)) for the entire time period of network evolution studied. This confirms a slight super-linear preferential attachment in the network from its start. This comes as no surprise as the tokens are managed by the programmable logic of the smart contract nodes and traded via the exchanges, and these are present in the network from its start.

Fig. 2.
figure 2

Evolution of best fit \(\alpha \) for all four tokens for in-degree \(\alpha ^{in}\) (top panel) and out-degree \(\alpha ^{out}\) (bottom panel) for preferential attachment over time, with disjoint and non cumulative time windows.

Fig. 3.
figure 3

Dismantling of largest strongly connected component in token with three different strategies, removing first highest in-degree nodes which are smart contracts and known exchanges addresses, EOA address, or a strategy combining two.

Fig. 4.
figure 4

Scalar assortativity while dismantling the network.

3.1 Network Dismantling

Network dismantling refers to a general problem of finding the minimal number of nodes whose removal dismantles a network [14] into isolated subcomponents. It belongs to a class of nondeterministic polynomial hard (NP-hard) problems, which essentially implies that there is currently no algorithm that can find the optimal dismantling solution for large-scale networks. However, there are approximate methods which work well enough in practice even for large networks [12, 13]. In this paper we are not interested in finding the most efficient dismantling strategy but rather on estimating the influence that the different types of nodes have on dismantling. Our aim is to asses their role in the structural integrity of the network. In our case, we are interested in the difference between nodes corresponding to the addresses of smart contracts and known exchanges, a list of whom was extracted from public sources such as [23] which are controlled by the logic of the code, as opposed to the nodes corresponding to the addresses of the externally owned accounts (EOA), which are controlled by the actual users possessing the corresponding cryptographic keys.

Our dismantling strategy consists of repeatedly removing nodes of the appropriate type with the highest in-degree \(k_{in}\) one-by-one, and then recalculating the in-degrees for all of the nodes before repeating the procedure. As a measure of dismantling we use the ratio of the Largest Strongly Connected Component (LSCC), i.e. the largest maximal set of graph nodes such that for every pair of nodes a and b, there is a directed path from a to b and a directed path from b to a. In our analysis we perform dismantling for up to 200 nodes of each type and for all four tokens separately as shown in Fig. 3. We observe that for all four tokens the removal of nodes corresponding to the addresses of contracts and known exchanges only causes faster dismantling than the removal of nodes corresponding to the addresses of EOA’s only - the LSCC collapses by removing just a handful of nodes. This indicates a large structural centralisation. Nodes corresponding to the addresses of smart contracts and known exchanges effectively act as hubs in the network. Unlike the nodes that correspond to addresses of EOA’s, they have a crucial structural role because they are involved in majority of the transactions. In the information security realm, intentional risk managers should protect these nodes the most [24]. We also performed additional dismantling for up to 10k nodes for each of the tokens but this did not show qualitatively different results, so in Fig. 3 we only show results for up to 200 nodes.

3.2 Assortativity

Assortativity coefficient r measures a general tendency of nodes of a certain degree \(k_{i}\) to attach to other nodes with similar degree. Its range is \(-1<r<1\). A positive value indicates assortative mixing: a high correlation between the degrees of neighboring nodes, forming usually communities. A value close to zero suggests non-assortative mixing: very low degree correlation, typical in core-periphery structures found in broadcasting. Finally, a negative values reveals disassortative mixing: a negative correlation, found in structures optimised for maximum distributed information transmission. Equation 4 presents the standard definition of assortativity coefficient r [25] where \(a_{i}=\sum _{j}e_{ij}\), \(b_{j}=\sum _{i}e_{ij}\) and \(e_{ij}\) is a fraction of edges from nodes of degree \(k_{i}\) to nodes of degree \(k_{j}\). Due to the high computational demand required to compute the assortativity coefficient in our transactions networks, we instead compute scalar assortativity \(r_{s}\) defined as Equation 5 [25], particularly useful when the degree changes over time [26].

figure b

In Fig. 4 we show the scalar assortativity of the networks as we remove more and nodes during dismantling, separately for two types of nodes, i.e., nodes corresponding to the addresses of smart contracts and known exchanges (blue line) and nodes corresponding to EOA addresses (orange line). Initial scalar assortativity of networks is slightly negative but close to 0 (from \(-0.06\) for BAT and UNI to \(-0.20\) for AMP), which is not surprising considering the centralisation in the network - most of the small in-degree nodes are connected to the large central hubs, with very little connections between them. Removal of nodes corresponding to EOA addresses during dismantling has no discerning effect on the scalar assortativity, while for contracts and known exchanges the assortativity tends to increase towards zero, making the networks less centralised and almost non-assortative. This is probably because the first nodes to be removed during dismantling are the highly connected hubs - by removing these nodes first the assortativity in the network rises because many connections of the low-to-high degree nodes, which contribute to the dissasortativity of the network, are removed as well.

4 Discussion

Decentralised finance (DeFi), based on public blockchain technology, holds over USD 80 B assets in September 2021. It aims to disrupt the traditional financial system by providing an alternative way to access financial services. It relies on automation to execute financial transactions on top of a decentralised public blockchain with no central governance. However, decentralisation in the underlying protocol does not necessarily imply decentralisation in the application space on top of it. Smart contracts providing DeFi services act as a central point for the protocol logic. We observe this centralisation in the transaction networks of DeFi-collateral tokens, where nodes corresponding to the addresses of smart contracts and known exchanges (controlled by the logic of code) exhibit different structural roles as opposed to the nodes corresponding to Externally Owned Accounts (EOA) addresses which should be controlled directly by users.

The four types of DeFi-collateral Ethereum-based tokens we study span multiple use cases in the DeFi sector: an algorithmic stablecoin (Ampleforth, AMP), a utility token used in digital marketing (Basic Attention Token, BAT), a multi-currency pegged stablecoin (Dai, DAI) and a governance token used in the UNI decentralised exchange (Uniswap, UNI).

We analyse the transaction networks of these four tokens up to mid 2021 to evaluate the structural roles in the network of two types of nodes: those representing addresses driven by code and those human-driven. Our analysis shows an increasing centralisation of their transaction network, with nodes corresponding to the addresses of smart contract and known exchanges acting as hubs: we find a decreasing density in the network as new nodes are added, which scales inversely proportional to the number of newly added nodes, as well as a slightly super-linear preferential attachment coefficient (\(\alpha ^{in}>1.0\)) which implies that few nodes are gaining most of the connections from the newly incoming nodes, a form of “winner takes all” effect commonly observed in social systems as well [27]. Those nodes should be protected the most from the information security viewpoint in terms of their availability and integrity. Network dismantling confirms the fact that these highly connected nodes indeed correspond to the addresses of smart contracts and known exchanges and not the EOA’s which are controlled by the actual users. Our network dismantling strategy removes one-by-one the two types of nodes with the highest in-degree \(k_{in}\) and measures the effect on the Largest Strongly Connected Component (LSCC). Our results conclude that the removal of nodes corresponding to the addresses of smart contracts and known exchanges causes a much faster dismantling than the removal of nodes corresponding to EOA’s. This confirms their structural role in the transaction network as hubs that mediate most of the transactions in the network.

Our analysis is restricted to only four representative tokens on Ethereum, the largest public blockchain for smart contracts. These results hint a potentially inconvenient fact for the DeFi sector, claiming to offer decentralisation and inclusiveness in its financial services. Most decentralised applications (dApps) run on smart contracts which effectively centralise application logic. Exchanges which process most of the transactions contribute to centralisation, regardless of whether exchanges are centralised (in that case transactions are off-chain) or decentralised (powered by smart contracts, in that case transactions are on-chain). The underlying situation mimics the advent of online social networks in the mid 2000’s. Although they run on nominally decentralised Internet protocol, they effectively centralised information flow within their application ecosystems over time. It seems that DeFi sector is on a similar centralisation trajectory, however, the long term consequences of this are yet unknown.

Future work can focus on popular tokens such as the heavily used USD-pegged asset-collateralised stablecoins USDT and USDC and tokens that offer cross-chain compatibility or second-layer solutions (like AAVE or Polygon, respectively). Additionally, we suggest to perform a similar analysis in newer smart contract blockchains such as Polkadot, Solana and Tezos. Finally, a novel research path would be to understand how second-layer blockchain solutions, that address scalability challenges in base layer blockchain protocols, and cross-chain compatibility protocols, that share information between different blockchains, influence decentralisation.