Keywords

1 Introduction

Ethereum is one of the most popular blockchains and its cryptocurrency is ever increasing in value. Just as other blockchains, such as Bitcoin, it uses a consensus algorithm called Proof of Work (PoW), a.k.a. mining, to maintain the integrity of the blockchain and to prevent double spend. The blockchain provides financial incentives to miners to perform PoW in the form of newly-minted coins and transaction fees paid by users wanting to perform transactions. In Ethereum these fees are called “gas”.

The term comes from the analogy that a car needs fuel to run and gas is the fuel that helps recording of transactions on the distributed ledger. Gas is the unit of measurement of computational power required for a miner to process a transaction and is measured in \(WEI = {10^{-18}} ETH\). The price of the execution of a transaction, a contract, or a deployment for a smart contract is \(Gas Cost \cdot Gas Price\) [10]. Just as fuel prices in real world, gas price may vary being subject of a negotiation process. The sender of a transaction specifies the maximum amount they are willing to pay, just as the miner has the options of accepting, partially refunding, or rejecting the offer.

Ethereum is also a hotbed for innovative Decentralized Applications fueled by Smart Contracts [21], the foundation for tokens representing digital assets or even real-world objects [14, 18]. Token transfers, as much as cryptocurrency transactions have an impact on gas prices, and hence provide valuable information for our prediction model.

2 Background and Related Works

2.1 Background

The Ethereum Blockchain. A block in the blockchain contains a header and several Merkle Patricia Trie structures [19], including one that has the transactions in it. Our model uses a limited number of fields as presented in Table 1.

Table 1. Block header fields

The Estimation Model. For this experiment we decided to use DeepAR [13]. Amazon SageMaker DeepAR is a tool that implements an unsupervised forecasting model based on autoregressive recurrent neural networks (RNN).

Unlike other forecasting methods, such as autoregressive integrated moving average (ARIMA) and exponential smoothing (ETS), DeepAR can learn a global model from multiple time series. The empirical experimental results produced by the Amazon team [13] show an improvement on standard metrics of up to \(15\%\) compared to state-of-the-art methods, such as Facebook’s Prophet [16].

2.2 Related Works

Three methods to analyze and predict gas prices are highlighted in [9]. The first method assumes the analysis of pending transactions in large Mempools [6]. Mempool is a buffer area where pending transactions sent by Ethereum clients are stored before they are added to the Ethereum blockchain. This method proves to be resource-intensive and complex to implement as it requires access to multiple Mempools and also it assumes that the owners of these Mempools are honest.

A second method analyzes recently committed blocks using oracles. These are systems that connect the blockchain to the outside world. Specifically, gas price oracles provide guidance to users regarding the gas price to pay to ensure that miner will accept the fees and commit the submitted transactions into subsequent blocks [15]. Examples include Ethereum client, Geth [8], EthGasStation [3], Gas Station Express [4].

A forecasting model based on Gated Recurrent Unit (GRU) [7] and a Gas Recommendation engine that leverages the output of the forecasting model was proposed in [20]. The approach used a Neural Net model that also included an additional parameter that reflects the urgency of the transaction (the higher the gas price, the faster the transaction is committed). The model reduced fees by more than \(50\%\) while increasing the waiting time by 1.3 blocks, when compared to the GETH oracle.

Rawya Mars et al. [12] evaluated the LSTM, GRU and Prophet models [16] to anticipate gas prices. An empirical evaluation resulted in better outcomes from LSTM and GRU models than Prophet model and the GETH oracle.

Table 2. Features used for training and inference (minute-by-minute intervals). The target time series are the Gas Prices and there are 6 additional dynamic features. During our experiments we found MATIC prices not to be helpful

A Gaussian process model to infer the minimum gas price is presented in ChihYun et al. [10]. Gaussian process is a non-parametric Bayesian approach to estimate a posterior over functions based on prior over functions using test data. This model performs better than GasStation-Express and Geth only when gas prices fluctuate widely. For this reason, they propose a hybrid solution combining GasStation-Express with their model.

3 Experiments and Results

3.1 Data Collection and Pre-processing

According to Salinas et al. [13], the covariates can be item and/or time dependent. For collecting the historical blockchain data, we used the Kaggle Ethereum Blockchain Complete live historical data (BigQuery) [2], as well as live minute-by-minute Ether (ETH or \(\varXi \)) and Polygon MATIC prices from cryptodatadownload.com [1], as seen in Table 2. Our Jupyter notebook [11] prepared the data for the training.

For the training and validation phase, we processed the mean of all the time series for every 20 min. After experimenting with various time series frequencies, we chose 20 min intervals as the best for this type of data. Ethereum gas prices fluctuate widely and hence the data is very noisy. In spite of not smoothing the data by eliminating outliers, our model performed very well. In the end, we had data processed at 20 min intervals for 291 days (January 1, 2021 to October 18, 2021). We used \(80\%\) of the data for training and \(20\%\) for validation.

3.2 Experimental Setup

We built a Python Jupyter notebook [11] and used Gluon Time Series (GluonTS) [5] for probabilistic time series modeling. The DeepAREstimator is an implementation of the model described by Salinas et al. [13]. We have configured the estimator as follows:

  • Prediction length of 40, thus providing \(40\cdot 20 = 800\) min = 13 h and 20 min.

  • Architecture of 4 layers with 40 cells per each layer.

  • Dropout rate = 0.1.

  • Context length (number of steps) of 80 (double of prediction length). Context length is the number of points provided to the model to make the prediction.

  • Cell type GRU. Note that we experimented with LSTM cells as well, although we did not notice any significant improvement, but rather a slight slow-down of the training.

  • The learning rate callback had the following settings: patience = 10, base LR = \(10^{-3}\), decay factor = 0.5.

  • Training was configured to run for 200 epochs.

  • We selected the checkpoints from 2 models, based on the best metric values.

The experiments were performed on a desktop computer equipped with Intel Corporation Xeon E3-1200 v6/7th GenCore Processor with 32 GB RAM and 240 GB SSD, NVIDIA GeForce GTX 1080 TI GPU, running Ubuntu 18.04.

3.3 Experimental Results

The tests were run for various date/time targets, and we noticed empirically an overall improvement in metrics of predictions as we added more features.

To find the best combination of features, we performed a greedy approach. First, we selected the feature with the highest impact by running the algorithm 7 times. Since using the MATIC prices gave worse results than using 0 dynamic features, we dropped this data from subsequent tests. Once we found the feature with best results, we ran the training and inference with the remaining 5 feature data and selected again the one giving best results. We continued the process for the rest of the features, thus performing a total of \(7+5+4+3+2+1=22\) trials.

Given \(y_{i}\) as the observed value, \(\hat{y_{i}}\) the predicted value and n the number of samples, we computed the following metrics using sklearn package:

  1. 1.

    Mean Absolute Error (MAE): \(MAE = \frac{\sum _{i=1}^{n}\left| y_{i} - \hat{y_{i}} \right| }{n}\)

  2. 2.

    Quantile Loss (QL) for a given quantile \(q\), defined as: \(L(\hat{y_{i}}, y_{i}) = \max \{q(\hat{y_{i}} - y_{i}), (q - 1)(\hat{y_{i}} - y_{i})\}\). This value is averaged across all predictions. We compared the values obtained for the following quantiles: \(q \in \{0.1, 0.5, 0.9\}\).

  3. 3.

    Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) defined as: \(MSE = \frac{\sum _{i=1}^{n}(y_{i} - \hat{y_{i}})^{2}}{n}\) and \(RMSE = \sqrt{\frac{\sum _{i=1}^{n}(y_{i} - \hat{y_{i}})^{2}}{n}} = \sqrt{MSE}\). To calculate the RMSE metric, we used data normalization by re-scaling the test and predicted data to have a mean of 0 and variance of 1.

Our best performance result with 5 dynamic features is presented in Fig. 1. Table 3 shows the values obtained for each metrics, depending on number of features.

Fig. 1.
figure 1

Prediction with 5 dynamic feature inputs: \(\varXi \) prices, Transaction Values, Committed Transactions, Token Transfers, and Gas Used

3.4 Discussion

Time series forecasting is one of the most important tools used by businesses, and there are a number of frameworks data scientists can use for this purpose. As expected, there is no “silver bullet” for any problem. The empirical research conducted by Z̆unić et al. [17] concludes that DeepAR (AWS) models “show superiority over classical methods only when they have a large number of signals over which to create a model, and in the case of articles with a short history”. Since cryptocurrency prices in general and \(\varXi \) in particular have numerous covariates that can be used, one can perform accurate predictions with shorter history, a benefit that low-power (e.g. IOT) edge devices can take advantage of when deciding on the timing for submitting transactions to the blockchain.

Table 3. Comparison of metrics by number of dynamic features involved. Values are represented in WEI (\(1 \varXi =10^{18}WEI\)).

Our analyzed data ranges between January 1, 2021 and October 18, 2021. During this time, gas prices ranged between 10 and 4315 GWEI, or a \(431.5\%\) fluctuation. As of this writing, a regular \(\varXi \) transfer has a limit of 21,000 units of gas. At the given price range, one would have to pay anywhere between 0.00021 and 0.9 \(\varXi \). At current exchange rate of 1 \(\varXi \) to 4,189 USD this comes to roughly between 1 and 3,770 USD for a simply sending \(\varXi \) to a different address. This clearly shows the importance of timing these transactions based on accurate predictions.

Although our experiment has room for improvements, it shows the power of probabilistic forecasting using DeepAR. Using this approach we were able to obtain accurate predictions in spite of a noisy dataset. DeepAR requires minimal feature engineering. We performed down sampling but did not remove outliers. We did, nevertheless, have to perform normalization, in spite of suggestions in the literature otherwise. Our model did not converge without normalizing the features first. By adding dynamic features, we achieved improvements in the prediction metrics (see Table 3). As the figures show, DeepAR’s Monte Carlo sampling-based quantile estimates are accurate and can be very useful in practice.

4 Conclusions and Future Works

Empirical analysis of Ethereum gas price prediction with DeepAR proves that carefully chosen covariates can improve model performance. Gas prices are impacted by various factors, including seasonality, volume of transactions, transaction values, number of token transactions, \(\varXi \) price, amount of gas used per block. Our focus in the future will be to identify additional features that can improve the performance of our model, by researching factors that have an impact on the supply and demand for gas. Such examples may include off-chain data, such as twitter or other social media events that may influence the volume of transactions on the Ethereum blockchain and, hence indirectly, gas and/or \(\varXi \) prices.

As DeepAR performs better than Facebook’s Prophet with a smaller amount of sales data [17], we will research potentials for deployment of our models on low-power connected devices.