1 Introduction

Cryptocurrency markets, notably exemplified by Ethereum, have become integral components of the global financial landscape, fostering new paradigms in finance and investment [1]. The dynamism and volatility inherent in these markets necessitate sophisticated forecasting models to empower investors [2], traders [3], and industry participants [4]. This study delves into the significance of Ethereum price prediction, offering a comprehensive analysis of forecasting methodologies and their implications.

Ethereum, with its smart contract functionality, has evolved into a linchpin of blockchain technology [5]. Its market, characterized by rapid fluctuations, presents a unique challenge and opportunity [6]. Accurate prediction of Ethereum prices is not merely a speculative exercise; it forms the backbone of strategic decision-making for various stakeholders [7]. Understanding the dynamics of Ethereum pricing is crucial for traders aiming to optimize investments, investors seeking informed entry or exit points, and industry participants navigating the nuances of this transformative financial ecosystem [8].

The motivation driving this research lies in the pivotal role Ethereum plays in shaping decentralized applications and financial instruments [9]. As Ethereum's impact extends globally, the ability to forecast its prices becomes paramount. The repercussions of precise prediction reverberate across diverse domains—from mitigating risks in investment portfolios to aiding policymakers in understanding the evolving nature of decentralized finance. The motivation is, therefore, rooted in the recognition that Ethereum's influence transcends individual transactions, impacting the broader financial landscape [10].

Accurate Ethereum price forecasts are not merely desirable but imperative [11]. In a market where shifts can be abrupt and substantial, the consequences of miscalculations are profound. This study addresses the pressing need for reliable predictive models, acknowledging that the implications extend far beyond individual profit margins [12]. The repercussions touch upon the stability and trust in emerging financial instruments, further underlining the urgency to refine and innovate in the realm of Ethereum price prediction [13].

As we delve into the EtherVoyant model and its comparative analyses, this study strives to provide not only predictive insights but also a nuanced understanding of the Ethereum market's intricacies. The ensuing sections not only present models but unravel the layers of challenges and opportunities in Ethereum price forecasting, with a vision to empower stakeholders in navigating this evolving financial frontier [14].

The biggest obstacle is creating an effective Ethereum price prediction model that can account for market volatility, seasonality, and abnormalities. By combining the advantages of ARIMA and SARIMA to take use of their complimentary capacities, we aim to alleviate the limitations of conventional forecasting methods by developing a novel hybrid model, EtherVoyant.

The following are some of the goals of this research:

  • To improve Ethereum price forecasts, we will construct the EtherVoyant hybrid forecasting model, which utilizes both the ARIMA and SARIMA approaches.

  • To manage missing values, outliers, and other data irregularities in the Ethereum price dataset, feature engineering and data pretreatment will be performed.

  • To improve the EtherVoyant model's performance and produce more accurate price predictions, we will investigate hyperparameter tuning strategies.

  • To show the superiority of the EtherVoyant model in terms of forecasting accuracy by comparing its forecasts to those of individual ARIMA and SARIMA models.

  • To help global stakeholders make better investment and risk management decisions in the cryptocurrency sector by providing more accurate Ethereum price predictions using the EtherVoyant model.

This study's goals are to further our knowledge of Ethereum as a key blockchain asset and to contribute to the field of cryptocurrency analytics. The improved forecasting capabilities of the EtherVoyant model may help businesses and investors all over the world better manage the volatile cryptocurrency market.

The following are a few of the major findings from the study:

Several significant advances in the art of predicting the value of cryptocurrencies have been made thanks to the proposed study:

  • A major advancement is the creation of the EtherVoyant hybrid forecasting model. EtherVoyant provides a more reliable method for predicting Ethereum prices by combining the advantages of the ARIMA and SARIMA methods. The innovative architecture of this model smoothes over the gap between standard time series models and the unique difficulties of cryptocurrency price data.

  • This research shows that the EtherVoyant model is superior to both the ARIMA and SARIMA models when predicting Ethereum values, as shown through extensive experimental and evaluation work. EtherVoyant's improved precision can help market participants in the cryptocurrency industry implement more effective risk mitigation and investment methods.

  • The research uses cutting-edge feature engineering and data preparation techniques to account for practical issues including missing data, outliers, and seasonality. The model's ability to catch intricate patterns and movements in Ethereum price data is enhanced by these methods, allowing for more precise predictions.

  • The study finds the best settings for the EtherVoyant model by investigating different hyperparameter tweaking strategies. This method of tuning improves the model's performance and reliability by making sure it makes optimal use of available historical data to make precise predictions.

  • Investors, dealers, and businesses all over the world who deal in cryptocurrencies may be affected in various ways by the EtherVoyant model's capacity to deliver more accurate Ethereum price predictions. Due to the dynamic and volatile nature of the cryptocurrency market, the model helps stakeholders make educated decisions and implement effective risk management strategies.

  • This study contributes to the developing field of cryptocurrency analytics, which is crucial to comprehending blockchain-based assets like Ethereum. Research provides a comprehensive methodological framework for developing and evaluating hybrid forecasting models for cryptocurrency price prediction; the EtherVoyant model is a major advancement in the application of state-of-the-art time series forecasting techniques to analyze and predict cryptocurrency prices. Future work in the field of bitcoin analytics can build off of this approach.

2 Related work

The potential of cryptocurrencies to alter the global financial system has garnered a lot of attention in recent years [1]. Among these crypto assets, Ethereum stands out as a pioneering decentralized blockchain platform that makes it possible for smart contracts and decentralized apps to disrupt numerous markets [2]. LSTM-GRU time series forecasting models have showed potential in enhancing Ethereum price prediction in the US [3], thanks to their capacity to capture complex patterns and trends in sequential data. To further improve these models' forecasting ability, researchers have investigated the possibility of incorporating other, exogenous variables, such as macroeconomic data, legislative events, and adoption metrics. In order to forecast the value of Ethereum tokens using deep learning, network motif analysis has been used [4]. This technique makes use of network motifs to decipher the intricate interplay between tokens and calculate future token prices. It has been suggested [5] that short-term price movements in cryptocurrencies can be predicted using a combination of technical indicators and deep learning. This combined method is an attempt to capture the interplay between pricing data's short- and long-term dynamics. Predicting the value of cryptocurrencies by an examination of social media sentiment has been investigated [6]. Researchers have attempted to measure the impact of market mood on cryptocurrency prices by mining price changes from the Reddit network's sentiment. Predictions of bitcoin prices have been made using ensemble methods of machine learning [7]. To improve price predictions' accuracy and reliability, these models mix several learning techniques. The popularity of transformers in NLP tasks indicates that they may be useful in capturing long-range dependencies in bitcoin price data [8]. The precision and accuracy of models built with machine learning have been measured empirically [9]. These tests demonstrate the ability of LSTM-GRU models to surpass current approaches to Ethereum price prediction and illustrate their supremacy in this area. The effects of free/libre/open-source software development on software engineering curricula have been studied [10], along with the advantages and disadvantages. This study adds to our knowledge of how open-source software design can affect pedagogical methods. Several cryptocurrency exchanges have made heavy use of machine learning and deep learning models for price prediction [11]. In this analysis, we compare the accuracy of different models for predicting bitcoin values on different markets. LSTM and embedding networks have been investigated by researchers as potential tools for better predicting bitcoin price fluctuations [12]. This method highlights the promise of using attentive LSTM and embedding networks to predict the future value of cryptocurrencies. The strengths and limitations of several machine learning approaches have been compared and contrasted in a study on cryptocurrency price prediction [13]. The use of FB-Prophet for time series forecasting of Ethereum price has been explored to shed light on potential future price movements [14]. The cryptocurrency price forecasting method used by FB-Prophet is straightforward and easy to understand. Predicting the price of a cryptocurrency using a deep learning system has increased investor knowledge of the market [15]. This study aids in clarifying the role of deep learning in economic prediction. By combining stance detection with transformers, analysts can use sentiment analysis to anticipate changes in bitcoin prices [16]. This research shows how effective transformers may be in capturing market mood. For the purpose of predicting the price of Ethereum, topological data analysis has been used [17]. This study compares the accuracy of Ethereum price forecasts using k-NN and multiple polynomial regression, advancing our knowledge of topological approaches to financial forecasting [18]. The results of this research investigating the effectiveness of both conventional and ML-based approaches are presented. Using long short-term memory (LSTM) networks, researchers have looked into Ethereum price time series forecasting [19]. Price data with temporal dependencies is found to be well captured by LSTM networks. It has been suggested [11] that the price of cryptocurrencies can be predicted using an ensemble and multimodal technique. To improve forecast accuracy, this method makes use of many data sources and models. There is a well-organized overview [20] on the use of machine learning-based time series analysis to forecast bitcoin prices. This research provides a state-of-the-art review on bitcoin price prediction. Researchers have used network motif analysis to look into the intricate connections between tokens and their values [21]. Using this method, we can see how the structure of the network influences our ability to anticipate prices. Spatial indicators for attaining healthy and sustainable cities have been the subject of research [22], and it has been found that these indicators can be developed utilizing open data and open-source software. This study aids in clarifying the function of free and open-source software in urban planning and design. Predictions of bitcoin values using deep learning algorithms have been made in an effort to raise investor education [23]. This research demonstrates how useful deep learning could be for economic prediction. Predictions of cryptocurrency prices using machine learning have been systematically compared [24]. The research sheds light on the benefits and limitations of different prediction methods.

In recent years, the transformative potential of cryptocurrencies, especially Ethereum, in reshaping the global financial system has been a focal point of research [1]. Ethereum's innovative decentralized blockchain platform, enabling smart contracts and decentralized apps, has disrupted various markets [2]. The application of LSTM-GRU time series forecasting models in Ethereum price prediction in the US has demonstrated promise, leveraging their ability to capture intricate patterns in sequential data [3]. To enhance forecasting accuracy, researchers have explored incorporating exogenous variables such as macroeconomic data, legislative events, and adoption metrics into LSTM-GRU models. Network motif analysis, using motifs to decipher the intricate interplay between tokens, has been employed for forecasting Ethereum token values [4]. The intersection of technical indicators and deep learning has been proposed for short-term price movement predictions in cryptocurrencies [5].

Various studies have investigated prediction methods, ranging from social media sentiment analysis [6] to ensemble machine learning techniques [7]. The integration of transformers, known for their efficacy in NLP tasks, has been explored to capture long-range dependencies in bitcoin price data [8]. LSTM networks have been utilized to capture temporal dependencies in Ethereum price time series forecasting [19]. Despite the plethora of research, a critical research gap persists in the lack of comprehensive models capable of effectively capturing the myriad patterns and dependencies in the volatile cryptocurrency market. Existing literature predominantly focuses on LSTM-GRU models, transformers, sentiment analysis, and other methods, with limited standardization of evaluation metrics and result comparisons. Notably, there is a dearth of studies addressing how to enhance prediction accuracy by incorporating external factors such as macroeconomic data and regulatory developments.

The identified research gap lies in the absence of standardized evaluation metrics and comprehensive models capable of accommodating the complexity of the cryptocurrency market. While LSTM-GRU models and similar techniques show promise, a critical need exists to establish a benchmark for comparison and to explore avenues for improving prediction accuracy. This research aims to address these limitations by providing a comparative analysis, contributing to the establishment of standardized evaluation metrics, and proposing models that integrate external variables. Table 1 shows the comparative table.

Table 1 Comparative table

The lack of comprehensive and robust models that can successfully capture the numerous patterns and dependencies found in the volatile and complex cryptocurrency market is the main research gap in the field of bitcoin price prediction utilizing machine learning and deep learning techniques. There has been a lot of research on LSTM-GRU models, transformers, sentiment analysis, and other methods, but there has not been nearly as much work done to standardize evaluation metrics and compare results. Even less study has focused on how to improve prediction accuracy by including external elements such as macroeconomic data and regulatory developments. If this knowledge gap could be closed, investors and other market participants would have better tools at their disposal to predict the future value of cryptocurrencies. By undertaking this study, we seek to not only contribute to the current understanding of Ethereum price prediction but also to offer practical tools for investors and market participants. Closing the knowledge gap identified will empower stakeholders with robust and reliable forecasting instruments, facilitating a more nuanced and informed approach to navigating the dynamic landscape of cryptocurrency investments.

3 Materials and methods

In this section, we detail the resources and procedures that went into creating and testing the EtherVoyant hybrid forecasting model, which we claim can be used to reliably predict Ethereum price movements in the future. Data collection, preprocessing, model construction, and evaluation are all a part of the study technique. Features, hyperparameter adjustment, and metrics for gauging EtherVoyant's performance are all discussed in detail. The forecasting algorithm is trained and tested using past Ethereum price data. Advanced time series techniques, especially ARIMA and SARIMA, are incorporated into the EtherVoyant model to manage the particular peculiarities of cryptocurrency price data. In order to take advantage of their synergies and improve forecasting accuracy, we explain how to combine these methods. We also go over how feature engineering and data preparation were used to the Ethereum price dataset to deal with real-world issues such as missing values and seasonality. The EtherVoyant model makes use of these methods to accurately reflect the fluctuations and complexity of the cryptocurrency market. In addition, we dive into the hyperparameter tuning process, where we test out different parameter settings to find the sweet spot for the model's efficiency. EtherVoyant's ability to make accurate and trustworthy Ethereum price forecasts relies on its hyperparameters, which were carefully chosen. Finally, we detail the criteria we used to compare EtherVoyant's predictions to those of ARIMA and SARIMA models separately. Critical to convincing others of EtherVoyant's merit and verifying its capacity to equip stakeholders with more accurate and informed Ethereum price predictions is the evaluation process.

3.1 Dataset description

The dataset utilized in this investigation spans from August 7, 2015, to October 18, 2021, covering a comprehensive timeframe for Ethereum price analysis. The dataset encompasses various essential elements, detailed in Table 2:

Table 2 Dataset feature description

Each entry in the time series is uniquely identified by its corresponding value in the "Date" column. Daily price variations are comprehensively documented through the "Open," "High," "Low," and "Close" columns, with the "Adj Close" column incorporating adjustments related to corporate activities. The "Volume" column provides valuable insights into market liquidity and investor interest, reflecting daily trading activity.

The EtherVoyant model is trained, validated, and tested on this dataset, and its performance is compared to that of individual ARIMA and SARIMA models during the course of the study. As a result of Ethereum's extensive price history, stakeholders can make informed judgements in the fast-moving cryptocurrency market (Figs. 1, 2, 3, 4, 5, and 6).

Fig. 1
figure 1

Visualization of each feature versus time

Fig. 2
figure 2

Ethereum price analysis

Fig. 3
figure 3

Ethereum trading volume

Fig. 4
figure 4

Pairwise plots for Ethereum price data

Fig. 5
figure 5

Pair plots for Ethereum price data

Fig. 6
figure 6

Correlation heatmap of Ethereum price data

3.2 Data preprocessing

The success of the EtherVoyant model construction relies on a high-quality dataset, and this can only be achieved through careful data pretreatment. Steps in this procedure include normalizing the data, removing outliers, and dealing with missing values.

3.2.1 Handling missing values

For numerous causes, such as insufficient data collection or data entry mistakes, the dataset may have some missing values. We use imputation approaches to address missing values, such as filling them in using the mean, median, or interpolation based on neighboring values.

3.2.2 Removing outliers

When the data distribution is skewed by outlying values, the model's accuracy can suffer. Using statistical tools such as the Z-score and the interquartile range (IQR), we find extreme values and either eliminate them or replace them with more reasonable ones.

3.2.3 Data normalization

To ensure that no single characteristic dominates the model training process due to its bigger magnitude, normalization of the data is required. Min–Max scaling and Z-score normalization are two common normalization methods.

The equations for preparing the data are as follows:

3.2.4 Handling missing values

Let's call the feature vector X the one that is lacking some data.

Take the feature vector after a certain imputation procedure has been applied, X_filled.

$$ X_{{{\text{filled}}}} = {\text{Impute}}\left( X \right) $$

3.2.5 Outlier removal

Let X represents the feature vector that contains the outliers.

The feature vector after outliers have been removed or replaced is denoted by X_cleaned.

$$ X_{{{\text{cleaned}}}} = {\text{Remove}}\;{\text{Outliers}}\left( X \right) $$

3.2.6 Data normalization

The feature vector that needs normalization is denoted by X.

Take the normalized feature vector, X_normalized, as an example.

$$ X_{{{\text{normalized}}}} = {\text{Normalize}}\left( X \right) $$

In order to provide reliable Ethereum price predictions, the EtherVoyant model requires high-quality input data that have been cleaned of missing values and outliers and has features scaled correctly (Fig. 7).

Fig. 7
figure 7

Data cleaning a before and after box plots and b before and after violin plots

3.3 Feature engineering

The EtherVoyant model's ability to capture intricate patterns and correlations in the Ethereum price data is greatly improved by the feature engineering process, in which we change and produce new features from the original dataset. In order to make the existing features more useful and pertinent to the forecasting assignment, feature engineering seeks to extract meaningful information and correlations from them (Fig. 8).

Fig. 8
figure 8

Feature engineering a before and after box plots and b before and after violin plots

3.4 ARIMA model

Using a combination of autoregression, differencing, and moving average components, ARIMA (AutoRegressive Integrated Moving Average) is a popular time series forecasting model. Time series data in which the mean and variance are constant across time are well-suited for the ARIMA model. Figure 9 shows the ARIMA model architecture. To ensure the efficacy of the ARIMA model, we conducted the augmented Dickey–Fuller (ADF) test to evaluate the stationarity of the Ethereum price data. The ADF test is a critical statistical tool for time series analysis that helps us understand the behavior of the underlying data. The ADF test results indicated that the Ethereum price series was non-stationary. Non-stationarity implies that statistical properties like mean and variance change over time. In the context of time series forecasting, non-stationary data can pose challenges as traditional models often assume a constant statistical structure.

Fig. 9
figure 9

ARIMA model architecture

Non-stationarity can adversely affect the performance of the ARIMA model. ARIMA, being a model designed for stationary time series, might yield inaccurate results when applied to non-stationary data. To address this, differencing is employed. Differencing is a technique used to stabilize the mean and make the data more amenable to modeling. In our case, differencing involves computing the differences between consecutive Ethereum prices. This transforms the non-stationary series into a stationary one, allowing the ARIMA model to capture the underlying patterns more effectively (Fig. 9).

ARIMA is denoted by the notation ARIMA (p, d, q), where:

p: How many lags (autoregressive terms) the model employs.

d: The amount of differentiation that must be applied to the time series such that it remains stationary.

q: Model's term count for moving averages.

The ARIMA model equation can be represented as follows:

$$y\left(t\right)= c + \Sigma \left({\varphi }_{i}* y\left(t-i\right)\right)+ \Sigma \left({\theta }_{i}* \varepsilon \left(t-i\right)\right)+ \varepsilon \left(t\right)$$

In which y(t) represents the point in time value of the series. The constant term, or intercept, is denoted by t. c.

\({\varphi }_{i}\) are the autoregressive coefficients, which show how previous data points have affected the current value.

(t) is the error term due to white noise at time t.

\(\theta i\) are the moving average coefficients, signifying the influence of previous mistake words on the present value.

The following are the main components of the ARIMA model:

Stationarity check: Statistical techniques, such as the augmented Dickey–Fuller (ADF) test, are used to determine whether the time series data are stationary. Differentiation is used to make data stationary (difference parameter d) if it is not already.

Analysis of the autocorrelation function (ACF) and partial autocorrelation function (PACF) is performed to determine the best values for the autoregressive term (p) and moving average term (q).

Model fitting: Using the estimated values of p, d, and q, the ARIMA model is fitted to the adjusted time series data. Maximum likelihood estimation is used to determine values for the model's parameters (_i and _i).

Model forecasting: The time series' future values are then projected using the ARIMA model.

The ARIMA model can effectively capture linear trends and autocorrelations in univariate time series data. Accurate forecasts in the presence of outliers and seasonality may necessitate further adjustment, however, and it may not be able to handle complex nonlinear interactions.

3.5 SARIMA model

In order to account for seasonality in time series data, the ARIMA model was extended to become SARIMA (Seasonal AutoRegressive Integrated Moving Average). When the values in a time series demonstrate distinct seasonal trends across set time periods (daily, weekly, or monthly), this method can be very helpful. Similar to the ARIMA section, an ADF test was conducted for the SARIMA model. The non-stationarity of the Ethereum price data was reaffirmed, emphasizing the necessity for proper preprocessing steps. The SARIMA model encounters similar challenges with non-stationary data. Hence, differencing is once again employed to induce stationarity. Figure 10 shows the SARIMA model architecture.

Fig. 10
figure 10

SARIMA model architecture

If we write SARIMA as SARIMA (p, d, q) (P, D, Q, s), then we have the SARIMA model:

p: The total amount of lags (autoregressive terms) in the model's non-seasonal section.

d: The amount of differencing that must be applied to the non-seasonal time series in order for it to become stationary.

q: The length of time that the non-seasonal moving average is applied.

P: The total number of seasonal autoregressive terms (lags) in the model.

D: The required level of differencing to ensure seasonal time series stationarity.

A: The seasonal model's term count of moving averages.

s: The time frame associated with the seasonality (e.g., 7 for weekly seasonality and 12 for monthly seasonality).

The SARIMA model equation can be represented as follows:

$$y\left(t\right)= c + \Sigma \left({\varphi }_{i}* y\left(t-i\right)\right)+ \Sigma \left({\theta }_{i}* \varepsilon \left(t-i\right)\right)+ \Sigma \left({\Phi }_{i}* y\left(t-is\right)\right)+ \Sigma \left({\Theta }_{i}* \varepsilon \left(t-is\right)\right)+ \varepsilon \left(t\right)$$

where:

y(t) is the value of the time series at time t.

\(c\) is the constant term or intercept.

\({\varphi }_{i}\) are the autoregressive coefficients in the non-seasonal part, representing the impact of the past observations on the current value.

\(\varepsilon (t)\) is the white noise error term at time t.

\({\theta }_{i}\) is the moving average coefficients in the non-seasonal part, representing the impact of past error terms on the current value.

\({\Phi }_{i}\) is the autoregressive coefficients in the seasonal part, representing the impact of past seasonal observations on the current value.

\({\Theta }_{i}\) is the moving average coefficients in the seasonal part, representing the impact of past seasonal error terms on the current value.

s is the seasonal period, indicating the number of time periods in one seasonal cycle.

The SARIMA model captures seasonal patterns in data through seasonal differencing and the selection of seasonal autoregressive and moving average terms (P, D, Q), but the key processes for fitting the SARIMA model are similar to those in the ARIMA model. The SARIMA model can successfully capture complicated patterns and trends in seasonal data, and it works well with time series data that include both seasonal and non-seasonal components.

3.6 EtherVoyant model

Combining the best features of the ARIMA and SARIMA models, the EtherVoyant model is a revolutionary hybrid time series forecasting model that can accurately anticipate Ethereum values. It takes advantage of both the seasonal and non-seasonal features of the time series data to identify long-term trends and patterns in the price of cryptocurrencies. EtherVoyant represents a novel hybrid approach that amalgamates the strengths of both ARIMA and SARIMA models to enhance Ethereum price prediction accuracy. Each model has its merits and limitations; ARIMA excels in capturing linear trends, while SARIMA is adept at handling seasonality. EtherVoyant begins by decomposing the Ethereum price time series into trend, seasonality, and residual components. This decomposition is pivotal for isolating the specific characteristics that ARIMA and SARIMA models are best suited to address. ARIMA Component: The trend component, representing the underlying linear patterns, is fed into an ARIMA model. ARIMA is well-suited for capturing these linear trends, providing a foundation for the overall prediction. SARIMA Component: Simultaneously, the seasonality component undergoes modeling with SARIMA. SARIMA excels in capturing cyclic patterns within the data, enhancing the model's ability to adjust for recurring market behaviors. By integrating ARIMA and SARIMA in this manner, EtherVoyant optimally combines their strengths. ARIMA tackles the linear aspects, while SARIMA excels in handling seasonality, offering a more comprehensive understanding of Ethereum's price dynamics. EtherVoyant strategically overcomes the limitations of each model. ARIMA, being inherently linear, might struggle with nonlinear elements. The introduction of SARIMA allows EtherVoyant to address nonlinear, seasonality-driven fluctuations, significantly improving the model's adaptability to the complex and dynamic nature of cryptocurrency market. EtherVoyant (p, d, q, P, D, Q, s) is the mathematical notation for the EtherVoyant model:

p: The number of autoregressive terms (lags) used in the non-seasonal part of the model.

d: The degree of differencing required to make the non-seasonal time series stationary.

q: The number of moving average terms used in the non-seasonal part of the model.

P: The number of autoregressive terms (lags) used in the seasonal part of the model.

D: The degree of differencing required to make the seasonal time series stationary.

Q: The number of moving average terms used in the seasonal part of the model.

s: The seasonal period (e.g., 7 for weekly seasonality and 12 for monthly seasonality).

To make more reliable Ethereum price forecasts, the EtherVoyant model combines the equations of ARIMA and SARIMA models, taking into account both non-seasonal and seasonal factors. Figure 11 shows the EtherVoyant model architecture.

Fig. 11
figure 11

EtherVoyant model architecture

The following equation can be used to illustrate the EtherVoyant model:

$$y\left(t\right)= c + \Sigma \left({\varphi }_{i}* y\left(t-i\right)\right)+ \Sigma \left({\theta }_{i}* \varepsilon \left(t-i\right)\right)+ \Sigma \left({\Phi }_{i}* y\left(t-is\right)\right)+ \Sigma \left({\Theta }_{i}* \varepsilon \left(t-is\right)\right)+ \varepsilon \left(t\right)$$

where:

y(t) is the value of the Ethereum price at time t.

c is the constant term or intercept.

φ_i is the autoregressive coefficients in the non-seasonal part, representing the impact of the past observations on the current Ethereum price.

ε(t) is the white noise error term at time t.

θ_i is the moving average coefficients in the non-seasonal part, representing the impact of past error terms on the current Ethereum price.

Φ_i is the autoregressive coefficients in the seasonal part, representing the impact of past seasonal observations on the current Ethereum price.

Θ_i is the moving average coefficients in the seasonal part, representing the impact of past seasonal error terms on the current Ethereum price.

The seasonal period, denoted by the symbol s, is the number of years in a complete seasonal cycle.

Using the ARIMA and SARIMA model fitting procedures, the values of p, d, q, P, D, Q, and s are determined and used to train the EtherVoyant model on historical Ethereum price data. Maximum likelihood estimation is used to determine the model's parameters.

The EtherVoyant model successfully captures seasonal and non-seasonal patterns in Ethereum price data by merging the ARIMA and SARIMA models. This makes EtherVoyant a robust and reliable model for estimating the future value of Ethereum.

All in all, the EtherVoyant model is a useful resource for boosting global Ethereum price predictions and aiding decision-making in the cryptocurrency market because to its new hybrid approach, consideration of seasonality, enhanced accuracy, and contributions to the field of time series forecasting.

3.7 Stationarity testing with augmented Dickey–Fuller (ADF) test

In time series analysis, ensuring stationarity is pivotal for the efficacy of models such as ARIMA and SARIMA. The augmented Dickey–Fuller (ADF) test emerges as a fundamental tool in assessing stationarity. This section outlines the null hypothesis being tested and the implications of the ADF results for each model (Table 3).

Table 3 Augmented Dickey–Fuller (ADF)

ARIMA vs. SARIMA: The SARIMA model outperforms ARIMA across all metrics, suggesting the significance of seasonal components in Ethereum price fluctuations.

EtherVoyant vs. Baseline Models: EtherVoyant exhibits superior performance compared to both ARIMA and SARIMA, emphasizing the effectiveness of the hybrid approach.

Short-Term Forecast: Both ARIMA and SARIMA perform well in short-term predictions, capturing day-to-day variations accurately.

Long-Term Forecast: EtherVoyant demonstrates enhanced accuracy in long-term predictions, leveraging the strengths of both ARIMA and SARIMA components.

For both ARIMA and SARIMA models, the null hypothesis (H0) of the ADF test is that the time series data have a unit root, indicating non-stationarity. The alternative hypothesis (H1) is that the data are stationary after differencing. The ADF test results for the ARIMA model provide crucial insights into the stationarity of the Ethereum price time series. A rejection of the null hypothesis (H0) in favor of stationarity is an essential prerequisite for the reliability of ARIMA predictions.

Similarly, for the SARIMA model, the ADF test serves as a diagnostic tool. Rejection of the null hypothesis (H0) indicates that the differenced series is stationary, reinforcing the model's suitability for capturing temporal patterns.

4 Results and discussion

Here, we report on the outcomes of trying to forecast Ethereum prices around the world using the EtherVoyant model, the ARIMA model, and the SARIMA model. The effectiveness, limitations, and forecasting insights of each model are discussed. Mean absolute error, mean absolute percentage error, mean squared error, and root mean squared error are some of the performance measures used in assessing the models. This section kicks off with a brief overview of the metrics that will be used to rate the model's efficiency. After that, we show the short-term and long-term Ethereum price projections from each model. The capacity of the model to capture market trends, seasonality, and sensitivity to different factors affecting Ethereum pricing are all discussed at length. We further emphasize the value of the hybrid approach taken by the EtherVoyant model over the ARIMA and SARIMA models separately in terms of boosting prediction accuracy. The ramifications of accurate Ethereum price predictions for different groups of stakeholders, including traders, investors, and industry participants, are discussed to round up the results and comments section. To further strengthen Ethereum price forecasting on a global basis, we also address potential future research directions and areas for model enhancement.

In the SARIMA section, we acknowledged the potential risk of overfitting, especially when dealing with complex models in time series forecasting. Overfitting occurs when a model learns not just the underlying patterns but also the noise in the training data, leading to poor generalization on new, unseen data. EtherVoyant addresses this concern through meticulous regularization techniques. Regularization is a crucial aspect of model training that prevents it from becoming too specialized in the training data. For SARIMA components, the EtherVoyant model incorporates:

L1 Regularization: This technique imposes a penalty on the absolute size of the SARIMA coefficients. It discourages overly complex models by pushing less influential parameters toward zero.

L2 Regularization: Also known as weight decay, L2 regularization penalizes the square of the coefficients' values. This discourages large coefficients, preventing them from dominating the model.

EtherVoyant engages in extensive hyperparameter tuning to strike the right balance between model complexity and generalization. This involves adjusting parameters such as the order of differencing, lag values, and seasonal components, ensuring that the model captures essential patterns without being overly influenced by noise. To further safeguard against overfitting, EtherVoyant employs cross-validation techniques. The model's performance is rigorously evaluated on multiple subsets of the data, providing a more robust assessment of its predictive capabilities on unseen data.

Additionally, the model's performance is evaluated using various metrics, such as mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), and root mean squared error (RMSE). These metrics provide a comprehensive view of how well EtherVoyant generalizes to different segments of the Ethereum price time series. By adopting these strategies, EtherVoyant not only optimally captures the complexity of Ethereum price dynamics but also guards against the pitfalls of overfitting, ensuring its predictions are reliable and applicable to real-world scenarios.

4.1 ARIMA model

In this section, the ARIMA model, a cornerstone of time series forecasting, is analyzed for its performance. Figure 12a, b depicts the autocorrelation function (ACF) and partial autocorrelation function (PACF), respectively; these are first thoroughly explained. By examining these graphs, the AR and MA parameters of the ARIMA model can be set to their most effective levels. Unlike the PACF plot, which displays the correlation between the series and its lags including the influence of intermediate lags, the ACF plot depicts the autocorrelation between the series and its lags. We use these graphs to determine good values for p and q. Figure 13 then shows the historical and forecasted pricing of Ethereum from 2018 to 2025. Future Ethereum price forecasts are made using an ARIMA model that was trained using past data. The stationarity of the time series data is tested to determine the model's efficacy in a procedure called the augmented Dickey–Fuller (ADF) test. Data differencing is necessary since it is non-stationary, as indicated by the ADF Statistic value of 2.520274060051624 and the p-value of 0.9990560978056306. Additionally, the crucial values at the 1%, 5%, and 10% levels of significance are provided. Failure to reject the null hypothesis confirms the data's non-stationarity due to an ADF Statistic larger than the crucial values. The performance of the ARIMA model is thoroughly analyzed, taking into consideration both its strengths and limitations. The model's capacity to detect autocorrelation and short-term trends is emphasized. Its limitations in coping with seasonality and recording complicated patterns are, however, also acknowledged. The ARIMA model provides a useful starting point for predicting Ethereum prices, although it may be constrained by the lack of seasonal components. After introducing the seasonality problem and describing how the SARIMA and EtherVoyant models improve forecasting accuracy, we proceed to examine these models in detail.

Fig. 12
figure 12

ARIMA a Autocorrelation and b partial autocorrelation

Fig. 13
figure 13

Actual and predicted prices till 2025 (price vs. date)

4.2 SARIMA model

The SARIMA model is assessed for its ability to capture seasonal and non-seasonal patterns in time series data. First, in Fig. 14a, b, we show the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots for the SARIMA model. The best settings for the seasonal moving average (SMA) and seasonal autoregressive (SAR) terms, denoted by P and Q, can be found with the help of these charts. Appropriate values for P and Q are chosen based on the ACF and PACF graphs. Figure 15 then displays the historical and forecasted pricing of Ethereum from 2018 to 2025. The SARIMA model is educated on the past Ethereum price data and then used to forecast future prices while accounting for seasonal and non-seasonal factors. The augmented Dickey–Fuller (ADF) test yields an ADF Statistic value of 3.520274060051624 and a p-value of 0.99690560978056306. This test is used to determine whether or not the data are stationary. We also give you the crucial values for the 1%, 5%, and 10% levels of significance. Failing to reject the null hypothesis because the ADF Statistic is larger than the crucial values confirms that the data are still non-stationary even after accounting for seasonality. The performance of the SARIMA model is reviewed in depth, with an emphasis on the model's capacity to account for seasonality and to detect both short- and long-term trends. Its possible overfitting with complicated seasonality and sensitivity to parameter choice is also taken into account as model limitations. The SARIMA model displays greater accuracy compared to the ARIMA model since it takes seasonality into account. However, it may still have limitations when it comes to catching complicated patterns, especially if seasonality is erratic or dynamic. Following this, we offer the unique EtherVoyant model, which combines the advantages of ARIMA and SARIMA models to overcome their respective limitations and provide improved Ethereum price predictions.

Fig. 14
figure 14

SARIMA a Autocorrelation and b partial autocorrelation

Fig. 15
figure 15

Actual and predicted prices till 2025 (price vs. date)

4.3 EtherVoyant model

The EtherVoyant model is an innovative mix of the ARIMA and SARIMA models for forecasting time series. Here, we provide the results of our analysis of EtherVoyant's predictive ability and performance. Autocorrelation function (ACF) and partial autocorrelation function (PACF) graphs are presented first in Fig. 16a, b, respectively, for the EtherVoyant model. Autoregressive (AR) and moving average (MA) plots, as well as seasonal autoregressive (SAR) and moving average (SMA) plots, can be used to fine-tune the parameters of a regression model. The best ordering for the EtherVoyant model is chosen based on these charts. Using the EtherVoyant model, we next illustrate the historical and future prices of Ethereum until the year 2025 in Fig. 17. To improve its forecasts, the EtherVoyant model incorporates both seasonal and non-seasonal factors into its training set of historical Ethereum price data. The root mean squared error (RMSE) is a performance indicator used to evaluate the model. Compared to the ARIMA and SARIMA models, EtherVoyant's RMSE for predicting Ethereum prices is the best at 866.682096793376. We also find that (2, 1, 1) is the optimal ARIMA order and that (0, 1, 0) is the optimal SARIMA order. Performance and advantages of the EtherVoyant model over standard ARIMA and SARIMA models are reviewed at length. Its versatility is underlined by the fact that it can deal with seasonal and a seasonal pattern, capture complicated trends, and make reliable forecasts far into the future. The model's flexibility and capacity to accommodate a wide variety of time series data are also highlighted. Overall, the EtherVoyant model demonstrates its capacity to empower worldwide Ethereum price predictions, making it an important contribution to the field of time series forecasting. EtherVoyant improves upon the limitations of the ARIMA and SARIMA models by combining their advantages.

Fig. 16
figure 16

EtherVoyant a Autocorrelation and b partial autocorrelation

Fig. 17
figure 17

Actual and predicted prices till 2025

4.4 Comparative analysis

In this section, we examine the ARIMA, SARIMA, and EtherVoyant forecasting models side by side and draw some conclusions about their relative merits. Mean absolute error (MAE), mean absolute percentage error (MAPE), and mean squared error (MSE) are among the error metrics used in the comparison. In addition, we provide a comparison of the models' short-term projections across a 5-year time horizon.

Subplots (a), (b), and (c) of Fig. 18 display the comparative analysis of mistakes. The accuracy of the models may be gauged by looking at the MAE, which is the average absolute difference between the actual and anticipated Ethereum prices. The MAPE measures the extent to which forecasts deviate from observed data and provides a percentage error estimate. Last but not least, the MSE measures the average squared deviation from the anticipated prices and highlights the significance of greater errors.

Fig. 18
figure 18

Comparative analysis of errors in normalized order a MAE, b MAPE, and c MSE

Comparisons of the three models' forecasts over a 5-year period are shown in Figs. 19, 20, and 21. These charts show how well various models do at predicting short-term price changes in Ethereum (Tables 4, 5, and 6).

Fig. 19
figure 19

Five years comparative prediction

Fig. 20
figure 20

Five years comparative prediction

Fig. 21
figure 21

Five years comparative prediction

Table 4 Comparative analysis
Table 5 Comprehensive error metrics
Table 6 Statistical significance (ADF test)

We give a table detailing the error metrics for each model to summarize the comparison investigation:

ARIMA vs. SARIMA: The SARIMA model outperforms ARIMA across all metrics, suggesting the significance of seasonal components in Ethereum price fluctuations.

EtherVoyant vs. Baseline Models: EtherVoyant exhibits superior performance compared to both ARIMA and SARIMA, emphasizing the effectiveness of the hybrid approach. Mean absolute error, mean absolute percentage error, and mean squared error are summarized in a table for ARIMA, SARIMA, and EtherVoyant forecasting models, respectively. In terms of performance and accuracy, predicting Ethereum prices, smaller values for MAE, MAPE, and MSE are preferable. The table clearly shows that EtherVoyant is a superior model to ARIMA and SARIMA when it comes to making accurate and trustworthy forecasts.

The comparison research shows that the EtherVoyant model outperforms the competition when it comes to making reliable forecasts over a variety of time frames. Among the many factors contributing to EtherVoyant's superior forecasting, accuracy is its innovative hybrid technique, which combines the advantages of ARIMA and SARIMA.

5 Conclusions

Our research concluded that sophisticated time series forecasting algorithms can be applied to improve worldwide Ethereum price prediction. We investigated the ARIMA and SARIMA models, as well as our own unique hybrid model, EtherVoyant. Their performance was evaluated in depth, and comparisons were made, for this analysis. The ARIMA model was used as a reference point because it effectively captured secular trends in Ethereum price movement. It provided useful insights into short-term changes, but it struggled with seasonality and had limitations in identifying complicated patterns. The SARIMA model accounted for seasonal factors to solve the problem of seasonality. When compared to ARIMA, it showed substantial improvement in accuracy, but it still struggled to capture non-regular or shifting seasonality. The best results came from the unique EtherVoyant model, a combination of ARIMA and SARIMA. EtherVoyant overcome its inherent limitations and obtained better d predicting accuracy by integrating the characteristics of both models. Short-term trends and seasonal patterns were also caught, providing useful information for forecasting the future. EtherVoyant's reduced MAE, MAPE, and MSE values were validated by error metric comparisons, making it the clear winner over ARIMA and SARIMA. In the realm of Ethereum price prediction, it is essential to acknowledge and appreciate the inherent risks and uncertainties embedded in the cryptocurrency market. The dynamic nature of this market, coupled with external factors influencing Ethereum's value, poses challenges in forecasting that cannot be understated. The cryptocurrency market is characterized by rapid changes, influenced by a myriad of factors such as regulatory developments, technological advancements, market sentiment, and macroeconomic shifts. These dynamics make predicting Ethereum prices a challenging task, as evidenced by the volatility witnessed over the years. The cryptocurrency market is characterized by rapid changes, influenced by a myriad of factors such as regulatory developments, technological advancements, market sentiment, and macroeconomic shifts. These dynamics make predicting Ethereum prices a challenging task, as evidenced by the volatility witnessed over the years. The innovative model performed exceptionally well, enabling global Ethereum price predictions with greater precision and dependability. The development of EtherVoyant, a novel hybrid model, and the understanding of the advantages and limitations of ARIMA and SARIMA for predicting cryptocurrency prices are some of our study's major achievements. Traders, investors, and industry members can all benefit from the insights provided by the study's findings and apply them in their daily work. In conclusion, our study demonstrates the promise of state-of-the-art time series forecasting algorithms for improving Ethereum price prediction around the world. The future of cryptocurrency market prediction could be shaped by additional improvements to the EtherVoyant model and its application to other cryptocurrencies. The dynamic and ever-changing world of cryptocurrency trading and investing is reflected in the findings of this study, which paves the way for future developments in time series forecasting.