From regression models to machine learning approaches for long term Bitcoin price forecast

Caliciotti, Andrea; Corazza, Marco; Fasano, Giovanni

doi:10.1007/s10479-023-05444-w

From regression models to machine learning approaches for long term Bitcoin price forecast

Original Research
Published: 03 July 2023

Volume 336, pages 359–381, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of Operations Research Aims and scope Submit manuscript

From regression models to machine learning approaches for long term Bitcoin price forecast

Download PDF

344 Accesses
2 Citations
Explore all metrics

Abstract

We carry on a long term analysis for Bitcoin price, which is currently among the most renowned crypto assets available on markets other than Forex. In the last decade Bitcoin has been under spotlights among traders all world wide, both because of its nature of pseudo–currency and for the high volatility its price has frequently experienced. Considering that Bitcoin price has earned over five orders of magnitude since 2009, the interest of investors has been increasingly motivated by the necessity of accurately predicting its value, not to mention that a comparative analysis with other assets as silver and gold has been under investigation, too. This paper reports two approaches for a long term Bitcoin price prediction. The first one follows more standard paradigms from regression and least squares frameworks. Our main contribution in this regard fosters conclusions which are able to justify the cyclic performance of Bitcoin price, in terms of its Stock–to–Flow. Our second approach is definitely novel in the literature, and indicates guidelines for long term forecasts of Bitcoin price based on Machine Learning (ML) methods, with a specific reference to Support Vector Machines (SVMs). Both these approaches are inherently data–driven, and the second one does not require any of the assumptions typically needed by solvers for classic regression problems.

Bitcoin Price Forecasting Through Crypto Market Variables: Quantile Regression and Machine Learning Approaches

Bitcoin Price Prediction Using Time Series Analysis and Machine Learning Techniques

Modeling Bitcoin Prices using Signal Processing Methods, Bayesian Optimization, and Deep Neural Networks

Article 28 October 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we consider a challenging price forecast problem, associated with a specific asset class, namely the crypto assets. In particular, we focus on one of the most famous crypto assets which is Bitcoin (Nakamoto, 2008), inasmuch as it currently corresponds also to the largest market capitalization asset among the crypto ones (see also (Various Authors, 2011; Nakamoto, 2014; Vigna & Casey, 2015).

Bitcoin was created in 2008 (Nakamoto, 2008) by an anonymous researcher (or possibly a team of people), under the nickname of Satoshi Nakamoto. It represents a digital asset whose implementation release protocol is open–source. What strongly characterizes Bitcoin with respect to fiat currencies (as Dollar, Euro, Yen, Pound, etc.) is its decentralized nature. Indeed, no private bank or national central bank is neither responsible for managing the overall amount of circulating bitcoins nor be able to issue new bitcoins. Bitcoin negotiations need exchanges to finalize transactions. These are special intermediaries who allow the negotiation of Bitcoin vs. the main fiat currencies or vs. other crypto assets. Nevertheless, peer–to–peer movements on the Bitcoin network can be perfectly finalized without the need for intermediaries, too. Transactions among users are validated by network nodes (computers), after solving complex inverse cryptographic problems. Moreover, the transactions cannot be removed from the Bitcoin network, since they are sequentially collected into blocks appended to a public distributed ledger called blockchain. Newly minted bitcoins are created every time a block is added to the blockchain, by special nodes (computers) of the network, associated with the so called miners, who are rewarded for solving the above complex inverse cryptographic problems. Miners’ rewarding policy of Bitcoin network changes every four years, identifying events in the history of Bitcoin known as halvenings, since they correspond indeed to halven the reward associated to each mined block.

In order to foresee the long term price for Bitcoin, a number of different approaches were considered in the literature (the interested reader can refer to the recent papers (Aggarwal et al., 2020; Sreekanth Reddy & Sriramya, 2020) and therein references), so that an increasing interest in the literature has grown in the last decade. Considering the wide range of stakeholders for crypto assets, ranging from practitioners to investors, researchers and members from private/public institutions, the quality of the literature on Bitcoin price prediction has sometimes been methodologically questionable. However, one of the main difficulties for Bitcoin price prediction relies on the high volatility of this asset (Baur & Dimpfl, 2021), whose price can definitely show large oscillations in a short time period. The main reason of this drawback is that Bitcoin is a relatively recent asset. Thus, considering its market capitalization (which is currently about one tenth of the overall gold capitalization), Bitcoin is often the target of speculations which include highly leveraged transactions on Bitcoin derivatives (i.e. futures and options).

Among the most recent contributions, for the assessment of the price of Bitcoin, we find the recent paper (Aggarwal et al., 2020), that introduces a ML–based approach to provide a quantitative model for Bitcoin price forecast. However, Aggarwal et al. (2020) basically relies on using intrinsic mode functions (IMFs), coupled with SVMs, which attempt to capture the natural characteristics of the time series associated with Bitcoin prices. As another contribution based on ML we find the analysis in Sreekanth Reddy and Sriramya (2020), which relies on the optimization method LASSO for ML.

Unlike the cited references, our main approach here is twofold. On one hand we investigate linear Least Squares models, to study the role played by the Stock–to–Flow ratio within regression problems related to Bitcoin price forecast. We recall that the Stock–to–Flow ratio is defined as the ratio between the current overall stock of Bitcoin on the market, and the quantity of bitcoins minted in a given time period. On the other hand, we show that we can use a ML–based technique (see Sect. 5), combining Mathematical Programming and SVMs, in order to provide a long term measurement system for estimating Bitcoin price. We will show that, under suitable assumptions, the two proposals provide similar results. We remark that our second proposal does not rely on those theoretical assumptions (e.g. the normal distribution of data) which are typically associated with regression formulations. More specifically, in our second proposal we combine a preliminary multiobjective programming approach with an SVM: this represents to our knowledge a completely novel framework in the literature.

Given the foregoing, it is possible to show that to some extent the price of Bitcoin shows a dependency on its Stock–to–Flow ratio (SF). For instance, Fig. 1 represents 1295 pairs of Bitcoin price vs. SF, corresponding to the period between January, 2011 and July, 2022,^{Footnote 1} Data has been transformed to allow easy processing, with respect to a logarithmic scale. The (red) bullets represent Bitcoin prices corresponding to SF values, and it is not difficult to realize the high volatility of Bitcoin price. Moreover, the time window length for computing the SF is given by 463 days, as indicated by (Buy Bitcoin Worlwide, 2019).^{Footnote 2} The rationale behind the determination of this value follows guidelines suggested by other scarce assets like, for instance, gold and silver. Briefly, given the average length of the production cycle of bitcoins between two consecutive halvenings (about 4 years, that is about 1, 460 days), practitioners and professionals have identified within this cycle the following three consecutive market phases: bull run, correction, and reversion to the mean. The lengths of these three phases are estimated approximately equal among them, that is $4 /3$ years each, hence the (approximate) value of 463 days for the time window length used to compute the Bitcoin SF.^{Footnote 3} However, this choice of the length of the time window may be itself questionable, and represents a key parameter, as detailed in Sect. 4. All this said, the approximate solution of a linear regression problem, to forecast Bitcoin price vs. its SF, through the solution of a linear least squares problem, strongly depends on a number of issues and definitely requires specific cares. For the sake of completeness we recall that the parallel continuous lines in Fig. 1 have equations $y= mx+q_1$ and $y= mx+q_2$, respectively, where m, $q_1$ and $q_2$ solve the linear programming problem (each pair $(\bar{x}_i,\bar{y}_i)$ represents the Bitcoin price, i.e. $\bar{y}_i$, corresponding to its SF, i.e. $\bar{x}_i$)

$$\begin{aligned} \begin{array}{l} \displaystyle \min _{m,q_2,q_1} \ q_2 - q_1 \\ \ \\ \quad \bar{y}_i \ge m \bar{x}_i + q_1, \qquad i= 1, \ldots , {1295}, \\ \ \\ \quad \bar{y}_i \le m \bar{x}_i + q_2, \qquad i= 1, \ldots , {1295}, \end{array} \end{aligned}$$

that is the area between the two lines identifies the narrowest stripe containing all Bitcoin transactions.

Note that the linear model suggested by Fig. 1, that is

$$\begin{aligned} \ln \left( {Price_{t}} \right) = \hat{m} \ln \left( SF_t \right) + \hat{q}, \end{aligned}$$

with $\hat{m}$ and $\hat{q}$ appropriate estimates of the slope and the intercept respectively, is very popular among practitioners and professionals (see for instance (Buy Bitcoin Worlwide, 2019; PlanB, 2019) and the really huge amount of more or less authoritative contributions in the so called Socials). However, these investigations are generally not well founded, with possible negative effects when used for professional trading purposes. One of the goals of this paper consists exactly in providing the essentials of such foundations (see Sects. 3 and 4).

The remainder of this paper is organized as follows. In Sect. 2 we remind the reader some details about the relationship between regression problems and least squares optimization approaches. Section 3 describes linear least squares schemes, where scaling on variables is suitably investigated in view of the analysis for Bitcoin price. In Sect. 4 we give indications about the computation of the SF ratio for Bitcoin, while Sect. 5 reports theoretical contributions and numerical results on long term Bitcoin price analyses which are based on Support Vector Machines (SVMs) and SVMs coupled with the bootstrap method. Finally, Sect. 6 reports some conclusions and guidelines for future work.

As regards the symbols used in this paper, we indicate with $\mathbb {E}[v]$ the expected value of the real random variable/vector v. With $A \succ 0$ $[A \succeq 0]$ we indicate that the matrix A is positive definite [positive semidefinite]. Finally, |B| denotes the cardinality of the set B.

2 Regression problems and least squares optimization

To better explain the reliability of solving linear least squares problems in the context of linear regression formulations, assume we are given the $1+p$ random variables Y and $\{X_i\}$, being

$$\begin{aligned} Y = \sum _{i=1}^p \beta _i X_i + u_i, \qquad \beta _i \in \mathrm{{I\!R}}, \quad i=1, \ldots ,p, \end{aligned}$$

(2.1)

where Y is the dependent variable and $\{X_i\}$ are the independent unknowns. Moreover, $u_i$ represents a statistical error, for any i, and satisfies $\mathbb {E}(u_i | X_1, \ldots , X_p)=0$. Then, if $Y, X_1, \ldots , X_p$ are independent and identically distributed (i.i.d.) and a few mild assumptions are fulfilled, the solution of the Linear Regression problem

$$\begin{aligned} \min _{\hat{a}, \hat{b}} \ \mathbb {E} \left[ Y- \left( \hat{b} + \sum _{i=1}^p \hat{a}_i X_i \right) \right] , \qquad \hat{a} \in \mathrm{{I\!R}}^p, \hat{b} \in \mathrm{{I\!R}} \end{aligned}$$

(2.2)

can be equivalently obtained by solving the Linear Least Squares problem

$$\begin{aligned} \min _{a,b} \sum _{j=1}^N \left[ Y^{(j)} - \left( b + \sum _{i=1}^p a_i X_i^{(j)} \right) \right] ^2, \qquad a \in \mathrm{{I\!R}}^p, b \in \mathrm{{I\!R}} \end{aligned}$$

(2.3)

where N represents the number of available samples for the random variables $(Y,X_1, \ldots , X_p)$. We strongly remark that the solution of the minimization problem (2.3) is definitely appealing, since it is an unconstrained convex quadratic model. However, we also highlight that the solutions of (2.2) and (2.3) might strongly differ in case the theoretical assumptions on the quantities $Y, X_1, \ldots , X_p, u_1, \ldots , u_p$ were not fulfilled. A typical example where we experience the last drawback is the case in which the samples do not follow a normal distribution. Conversely, in case the random variables $u_1, \ldots , u_p$ admit the joint normal distribution $N(0,\sigma ^2 I)$, with zero expected value and the same variance for all the variables, then the solutions of (2.2) and (2.3) coincide.

Typically in applied sciences (2.3) is often solved assuming the fulfillment of indispensable theoretical assumptions, which unfortunately are often not satisfied, as specified above. Thus, a test on the reliability of the solutions of (2.3) is usually sought (e.g. the $R^2$ and the p–value indicators).

In this paper we are interested about estimating the price (i.e. Y) of Bitcoin vs. its SF (i.e. X), being $Y= aX + u$, with $a \in \mathrm{{I\!R}}$; however, the error u is not normally distributed, so that the solution of the linear least squares problem (2.3) might possibly represent a poor estimator [see also the practical analysis on Graybill and Iyer (1994)].

3 Our setting for data scaling in Least Squares problems

On the guidelines of Sect. 2, we detail here the linear least squares setting we make reference to. Note that in this section we propose some novel theoretical results, in order to address and give foundation to claims and questions raised by practitioners, within the literature on Bitcoin [see for instance Buy Bitcoin Worlwide (2019); PlanB (2019)]. Generally, these results are simple from a mathematical standpoint (mainly, they refer to the classical Box-Cox transformation approach Box and Cox (1964)) but are meaningful from the point of view of the Bitcoin price forecast. For these reasons, below we consider a couple of different linear regression problem formulations, as general as possible, in order to take into account a number of possible parameters that can affect Bitcoin price forecast.

Let us consider the N training pairs $(\bar{x}_i, \bar{y}_i) \in \mathrm{{I\!R}}^2$, $i=1, \ldots ,N$, and let us consider the following least squares problem, where we assume that the training pairs are possibly subject to a $\log $–transformation, being $\alpha $ and $\beta $ suitable positive parameters not equal to zero or one

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ \sum _{i=1}^N \left[ \log _{\alpha }(\omega \bar{y}_i) - m \log _{\beta }(\bar{x}_i) -q \right] ^2. \end{aligned}$$

(3.4)

In particular, in the pair $(\bar{x}_i, \bar{y}_i)$ the quantity $\bar{x}_i$ is associated with Bitcoin SF value, while $\bar{y}_i$ represents the corresponding price of Bitcoin. The motivation for introducing in (3.4) the parameter $\omega $ will be clear later on, and further specific motivations to recur to the log–scaled expression (3.4) are given in Buy Bitcoin Worlwide (2019). They basically reduce to the fact that, assimilating the pair $(\bar{x}_i, \bar{y}_i)$ to a determination of a two–dimensional random variable (x, y), then (x, y) has not necessarily a normal distribution. In this regard, a $\log $–transformation, with respect to at least one of the random variables, may possibly yield the pairs $\{(\bar{x}_i, \bar{y}_i)\}$ to better resemble a normal distribution.

Proposition 1

Given the problem (3.4) with $\alpha , \beta \not \in \{0,1\}$ and positive, let

$$\begin{aligned} \begin{array}{lcl} \displaystyle A = \sum _{i=1}^N [\log _{\beta }(\bar{x}_i)]^2, &{} \ \ {} &{} \displaystyle B = \sum _{i=1}^N [\log _{\alpha }(\bar{y}_i)]^2, \\ \displaystyle C = 2\sum _{i=1}^N \log _{\beta }(\bar{x}_i), &{} \ \ {} &{} \displaystyle D = 2\sum _{i=1}^N \log _{\alpha }(\bar{y}_i), \\ \displaystyle E = 2\sum _{i=1}^N \log _{\beta }(\bar{x}_i)\log _{\alpha }(\bar{y}_i). &{} &{} \end{array} \end{aligned}$$

(3.5)

Assume without loss of generality that $B \ne 0$ and the sequence $\{\log _{\beta }(\bar{x}_i)\}$ contains at least two non–coincident entries. Then the optimal solution $m^*, q^*$ to (3.4) satisfies the following properties:

1.
$m^*$ is independent of $\omega $, being
$$\begin{aligned} m^* = \frac{2NE-CD}{4NA-C^2}; \end{aligned}$$
2.
$q^*$ is not independent of $\omega $, being
$$\begin{aligned} q^* = \log _{\alpha }(\omega ) - \frac{CE-2AD}{4NA-C^2}. \end{aligned}$$

Proof

We rewrite (3.4) as

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ \sum _{i=1}^N \left[ \log _{\alpha }(\omega ) + \log _{\alpha }(\bar{y}_i) - m \log _{\beta }(\bar{x}_i) -q \right] ^2, \end{aligned}$$

so that after computing the square in the sum and collecting the terms we equivalently obtain the problem

$$\begin{aligned}{} & {} \min _{m, q \in \mathrm{{I\!R}}} \ \ N \left[ \log _{\alpha }(\omega )\right] ^2 + Am^2 + Nq^2 + B - C \log _{\alpha }(\omega )m - 2N \log _{\alpha }(\omega )q + D \log _{\alpha }(\omega ) + Cmq - Em -Dq \end{aligned}$$

and after a few arrangements

$$\begin{aligned}{} & {} \min _{m, q \in \mathrm{{I\!R}}} \ \ N \left\{ \left[ \log _{\alpha }(\omega )\right] ^2 + q^2 - 2\log _{\alpha }(\omega ) q \right\} + Am^2 - C m \left[ \log _{\alpha }(\omega ) - q \right] \\{} & {} \quad + D \left[ \log _{\alpha }(\omega ) - q \right] - Em + B \end{aligned}$$

or equivalently

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ N \left[ \log _{\alpha }(\omega ) - q \right] ^2 + Am^2 + (D - C m) \left[ \log _{\alpha }(\omega ) - q \right] - Em + B. \end{aligned}$$

Thus, after setting $s = \log _{\alpha }(\omega ) - q$ we equivalently have to solve the problem

$$\begin{aligned} \min _{m, s \in \mathrm{{I\!R}}} \ \ \psi (s,m) \equiv N s^2 + Am^2 + (D - C m)s - Em +B. \end{aligned}$$

Now observe that the Hessian matrix of the last quadratic function is constant and is given by

$$\begin{aligned} \nabla ^2 \psi (s,m) = \left( \begin{array}{cc} 2N &{} -C \\ &{} \\ -C &{} 2A \end{array} \right) , \end{aligned}$$

so that it is positive definite. Indeed $N,A \ge 0$ and $\det \left[ \nabla ^2 \psi (s,m) \right] = 4NA-C^2$, with $|C| \le 2 \Vert v\Vert _1$ and

$$\begin{aligned} v = \left[ \begin{array}{c} \log _{\beta }(\bar{x}_1) \\ \vdots \\ \log _{\beta }(\bar{x}_N) \end{array} \right] . \end{aligned}$$

Hence, $C^2 \le 4 \Vert v\Vert _1^2$ so that

$$\begin{aligned} \det \left[ \nabla ^2 \psi (s,m) \right]= & {} 4NA - C^2 \ \ge \ 4N \Vert v\Vert _2^2 - 4 \Vert v\Vert _1^2 \\= & {} 4 (\sqrt{N}\Vert v\Vert _2 + \Vert v\Vert _1)(\sqrt{N}\Vert v\Vert _2 - \Vert v\Vert _1). \end{aligned}$$

Since $\Vert v\Vert _1 \le \sqrt{N}\Vert v\Vert _2$ (see Fact 9.8.12 in Bernstein,2009), then $\det \left[ \nabla ^2 \psi (s,m) \right] \ge 0$. Moreover, since at least two entries of v are non coincident, then $\psi (s,m)$ is a strictly convex function in $\mathrm{{I\!R}}^2$, inasmuch as $4NA - C^2 > 0$. Now, first order stationarity conditions applied to $\psi (s,m)$ yield

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \frac{\partial \psi (s^*,m^*)}{\partial s} = 2Ns^* + (D-Cm^*) = 0 \\ \ \\ \displaystyle \frac{\partial \psi (s^*,m^*)}{\partial m} = 2Am^* -Cs^* - E = 0, \end{array} \right. \end{aligned}$$

so that

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle m^* = \frac{2NE - CD}{4NA - C^2} \\ \ \\ \displaystyle s^* = \frac{CE - 2AD}{4NA - C^2}, \end{array} \right. \end{aligned}$$

(3.6)

i.e. the statement of the proposition definitely holds. $\square $

Similarly to the previous proposition we also have the following result, where the scaling parameter $\omega $ is applied to the sequence $\{\bar{x}_i\}$, i.e. now we refer to the case where problem (3.4) becomes

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ \sum _{i=1}^N \left[ \log _{\alpha }( \bar{y}_i) - m \log _{\beta }(\omega \bar{x}_i) -q \right] ^2. \end{aligned}$$

(3.7)

Proposition 2

Given the problem (3.7) with $\alpha , \beta \not \in \{0,1\}$ and positive, let us consider the positions (3.5). Assume without loss of generality that $B \ne 0$, $A+ C \log _{\beta }(\omega ) > 0$ and the sequence $\{\log _{\beta }(\bar{x}_i)\}$ contains at least two non–coincident entries. Then the optimal solution $m^*, q^*$ to (3.7) satisfies the following properties:

1.
$m^*$ is independent of $\omega $, being
$$\begin{aligned} m^* = \frac{2NE-CD}{4NA-C^2}; \end{aligned}$$
2.
$q^*$ is not independent of $\omega $, being
$$\begin{aligned} q^* = - \frac{2NE - CD}{4NA - C^2} \log _{\beta }(\omega ) - \frac{CE-2AD}{4NA-C^2}. \end{aligned}$$

Proof

Following the guidelines of Proposition 1 we can equivalently rewrite (3.7) as

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ \sum _{i=1}^N \left[ \log _{\alpha }(\bar{y}_i) - m \left[ \log _{\beta }(\omega ) + \log _{\beta }(\bar{x}_i) \right] -q \right] ^2, \end{aligned}$$

i.e.

$$\begin{aligned} \min _{m, q \in \mathrm{{I\!R}}} \ \ \psi (m,q)\equiv & {} B + Nm^2 \left[ \log _{\beta }(\omega ) \right] ^2 + Am^2 + Cm^2\log _{\beta }(\omega ) + Nq^2 - Dm\log _{\beta }(\omega ) \\{} & {} - Em - Dq + 2Nmq\log _{\beta }(\omega ) + Cmq. \end{aligned}$$

Observe that by the assumptions the Hessian matrix

$$\begin{aligned} \nabla ^2 \psi (m,q) = \left( \begin{array}{ccc} 2N \left[ \log _{\beta }(\omega ) \right] ^2 + 2A + 2C \log _{\beta }(\omega ) &{} \quad &{} 2N\log _{\beta }(\omega ) \\ \ &{} &{} \\ 2N\log _{\beta }(\omega ) &{} &{} 2N \end{array} \right) \end{aligned}$$

is positive definite (indeed the assumption $A + C \log _{\beta }(\omega ) > 0$ implies that both its determinant and its trace are positive), so that $\psi (m,q)$ is strictly convex on $\mathrm{{I\!R}}^2$.

Now, first order stationarity conditions applied to $\psi (m,q)$ yield

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \frac{\partial \psi (m^*,q^*)}{\partial m} = 2Nm^*\left[ \log _{\beta }(\omega ) \right] ^2 + 2Am^* + (2Cm^* - D)\log _{\beta }(\omega ) - E \\ \hspace{2.35truecm} + \ 2Nq^*\log _{\beta }(\omega ) + Cq^* = 0 \\ \ \\ \displaystyle \frac{\partial \psi (m^*,q^*)}{\partial q} = 2Nq^* - D + 2Nm^*\log _{\beta }(\omega ) + Cm^* = 0, \end{array} \right. \end{aligned}$$

so that from the second equation

$$\begin{aligned} q^* = \frac{D - m^* \left[ 2N\log _{\beta }(\omega ) + C \right] }{2N} \end{aligned}$$

and replacing into the first equation we have after some computations

$$\begin{aligned} m^* = \frac{2NE-CD}{4NA-C^2}. \end{aligned}$$

Note also that, similarly to Proposition 1, the assumptions ensure that the denominator $4NA-C^2$ of $q^*$ is strictly positive. Finally, we also obtain after some computations

$$\begin{aligned} q^*= & {} \frac{D - m^* \left[ 2N\log _{\beta }(\omega ) + C \right] }{2N} \ = \ -m^*\log _{\beta }(\omega ) + \frac{D - m^*C}{2N} \\= & {} - \frac{2NE - CD}{4NA - C^2} \log _{\beta }(\omega ) - \frac{CE - 2AD}{4NA - C^2}, \end{aligned}$$

which completes the proof. $\square $

Observation 3.1

We highlight that whatever the value of $\omega $ in the linear regression problems (3.4) and (3.7), Propositions 1 and 2 give the same optimal $m^*$s, and that in case $\omega =1$ in (3.4) and (3.7), then evidently the optimal $q^*$s from Propositions 1 and 2 (as expected) coincide. Moreover, the assumption $A + C \log _{\beta }(\omega ) > 0$ in Proposition 2 is not particularly restrictive for Bitcoin, as detailed in Sect. 4.

Observation 3.2

We stress that the outcomes according to which both (3.4) and (3.7) have optimal $m^*$s independent of $\omega $ imply that the value of $\omega $ does not affect the forecast of the Bitcoin price variation.

As for the roles played by the bases of the logarithms $\alpha $ and $\beta $, with respect to the natural one, we provide the following results.

Corollary 1

Let us consider the problem (3.4) with $\alpha =\alpha ', \beta =\beta ' \not \in \{0,1\}$ and positive, and let us consider the same problem with $\alpha =\beta =e$. Then the problem using the natural base has the optimal intercept $q_e^*$ such that:

$$\begin{aligned} q_e^* = \frac{1}{\log _{\alpha '}(e)}q^*. \end{aligned}$$

Proof

The proof directly holds performing a straightforward computation. $\square $

Corollary 2

Let us consider the problem (3.7) with $\alpha =\alpha ', \beta =\beta ' \not \in \{0,1\}$ and positive, and let us consider the same problem with $\alpha =\beta =e$. Then the problem using the natural base has the optimal intercept $q_e^*$ such that:

$$\begin{aligned} q_e^* = \frac{1}{\log _{\alpha '}(e)}q^*. \end{aligned}$$

Proof

Again the proof directly holds performing a straightforward computation. $\square $

Corollary 3

Let us consider the problems (3.4) and (3.7) with $\alpha =\alpha ', \beta =\beta ' \not \in \{0,1\}$ and positive, and let us consider the same problems with $\alpha =\beta =e$. Then both the problems using the natural base have the same optimal slope $m_e^*$ such that:

$$\begin{aligned} m_e^* = \frac{\log _{\beta '}(e)}{\log _{\alpha '}(e)}m^*. \end{aligned}$$

Proof

The proof follows the guidelines for Corollaries 1 and 2. $\square $

Observation 3.3

We remark that in case $\alpha ' = \beta '$ with $\alpha ', \beta ' \not \in \{0,1\}$ and positive, then $m_e^* = m^*$. This sheds some lights on possible doubts raised within the literature for practitioners, about the possible dependency of Bitcoin price forecast on the bases $\alpha , \beta $ in (3.4) and (3.7) (see PlanB (2019)).

4 The assessment of the SF for Bitcoin

As reported in Sect. 1, several authors in the literature have observed a relationship between the price of Bitcoin and its SF value (see e.g. Buy Bitcoin Worlwide (2019), among the first). Here we want to better investigate the last conclusion, in view of Propositions 1 and 2. In particular, we want to show that the SF value for Bitcoin strongly relies on the time window we use to compute it: this motivates the introduction of the parameter $\omega $ in the formulation (3.7), in case $\bar{y}_i$ represents the i–th price of Bitcoin and $\bar{x}_i$ is associated with the i–th value of the SF.

As a preliminary example, recalling that the SF is the ratio between the current stock of an asset and its flow within a given time window, assume the flow is computed in a time interval of one year (i.e. 365 days as in Fig. 1). Then, SF essentially represents the number of years, at the current annual production rate, that are necessary to obtain its current stock. Hence, the higher the SF the scarcer the asset. Now, let us consider in our example the pairs price vs. SF of Bitcoin, from January 1st, 2011, to July 15th, 2022^{Footnote 4}. Then, the formula to be adopted for the computation of the SF for Bitcoin should be (as an example the stock in the formula refers to the end of September, 2021)

$$\begin{aligned} SF_{Bitcoin} \approx \frac{18,700,000}{\frac{24 \cdot 60}{10} \cdot \textbf{463} \cdot 6.25} \approx 44.9. \end{aligned}$$

(4.8)

The quantities in the last computation duly keep into consideration the following facts:

every day about $24 \cdot 60 / 10$ newly mined blocks are added to Bitcoin blockchain (i.e. around one block every 10 min);
for each newly added block to the blockchain of Bitcoin, exactly 6.25 bitcoins are minted and rewarded to miners, so that they can be possibly negotiated (i.e. they are potentially available) on the market. Moreover, the value 6.25 will be successively halved in $2020 + 4k$, for any $k=1,2, \ldots $, up to about 2140;
the time window for computing the stock is different with respect to 365 days (being indeed 463 days as suggested in Buy Bitcoin Worlwide (2019)), so that the corresponding value is used in the denominator of (4.8), in place of 365.

We comparatively observe that the current SF of the gold is a bit larger than 60, so that we conclude that the scarcity of Bitcoin to some extent compares with that of the gold (hence the widely used nickname of digital gold for Bitcoin). Figure 2 reports a plot of Bitcoin price vs. its SF: the dashed line is a regression line obtained from solving either the problems (3.4) or (3.7), after setting $\omega =1$. The slope of the resulting regression line is $m^* \approx 2.68084$ and $q^*$ amounts to $\approx 0.40596$, while red points represent the pairs $(\ln (price), \ln (SF))$. Thus, relation

$$\begin{aligned} \ln (price) = 2.680 \ln (SF) + 0.405 \end{aligned}$$

(4.9)

immediately yield the Bitcoin price forecast. For instance, considering the value of the SF in (4.8) (i.e. at the end of September, 2021), we had

$$\begin{aligned} price = SF^{2.680} \cdot e^{0.405} = 44.9^{2.680} \cdot e^{0.405} \approx 40,171 \$. \end{aligned}$$

For the sake of completeness observe that $R^2$ and p-value, associated with the computation of the linear regression model (4.9), are given by

$$\begin{aligned} R^2 = 0.639, \qquad p \ll 0.05, \end{aligned}$$

which shows that the linear regression model is only partially reliable. Furthermore, we remark that there is no discrepancy between the outcomes of Figs. 1 and 2, since in both the figures the flow, in the SF for Bitcoin, is computed with respect to the same time window (i.e. 463 days). This conclusion highlights the importance of introducing the parameter $\omega $ in (3.7), inasmuch as the following observations hold:

when in (3.7) $\bar{x}_i$ represents a value of Bitcoin SF and we set $\omega =1$, then it means we are adopting exactly the formula (4.8) for the SF;
when in (3.7) $\bar{x}_i$ represents a value of Bitcoin SF and we set $\omega =365/463$, then it means we are adopting the formula (4.8), with the number 365 in place of 463.

Hence, the parameter $\omega $ duly takes into account the time window adopted for computing the SF of Bitcoin, and according with Proposition 2 it does not influence the slope $m^*$ of the corresponding regression line. This is to our knowledge a remarkable novel result in the literature.

Observation 4.1

We highlight that the choice of the value of $\omega $ affects the values of the optimal $q^*$s given by Propositions 1 and 2, and consequently influences the Bitcoin price forecast. Therefore, in order to improve the quality of the forecast, a more general optimization process might be considered, also taking into consideration $\omega $ itself as unknown, in addition to q and m. It will be a point of our future research on this topic. Nevertheless, in general, for trading purposes what really matters is possibly not the forecast of the asset price, but rather the forecast of the asset price variation. In this regard, as stressed in Observation 3.2, the value of $\omega $ does not affect the Bitcoin price variation forecast. So that, from an operational point of view, the choice of such parameter might be not so crucial.

Lastly, given that the linear model is inherently data–driven, we point out that the assumptions needed by the solver to address the corresponding regression problem has to be satisfied. In this regard, we recall that one of the main assumptions underlying the considered regression problem is that the probability distribution of Y conditional on $(X_1, \ldots , X_i, \ldots , X_p)$ is normal (see (2.1)).

In order to verify this assumption, we carried out the test of Jarque and Bera (see 1987) as follows

$$\begin{aligned} \left\{ \begin{array}{l} H_0\mathrm{:} \;\, \text {the probability distribution of }{} { u} \text { is normal} \\ H_1\mathrm{:} \;\, \text {the probability distribution of }{} { u} \text { is not normal} \\ \end{array} \right. \end{aligned}$$

where $u \doteq (u_1, \ldots , u_N)$ is the vector of the residuals coming from the regression problem

$$\begin{aligned} \min _{m,q} \sum _{j=1}^N \left[ \ln (price)^{(j)} - m \ln (SF)^{(j)} - q \right] ^2, \qquad m, q \in \mathrm{{I\!R}}\mathrm{.} \end{aligned}$$

In particular, we performed the test for the overall period considered in the example presented in this section, that is from January 1st, 2011 to July 15th, 2022, and for each of the thirteen time-windows considered in Sect. 5.2. For all such time periods, the null hypothesis has been accepted at the 5% significance level, with p-values ranging in the interval [0.1170, 0.1666]. Hence, given this matter of fact, we decided to admit the use of the linear regression approach when reporting the comparative results in Sect. 5.2.

5 Our second approach for long term Bitcoin price prediction

In this section we describe our second approach for the long term Bitcoin price forecast. In particular, first, in Sect. 5.1, we show that combining a Multiobjective Technique (MT), from Nonlinear Programming, with a SVM, from ML (see also Cristianini and Shawe-Taylor, 2000; Vapnik, 1995; 1998; Hastie et al., 2008; Deng et al., 2013; Marsland, 2015), we can recover results similar to those from the first approach, without requiring any assumption typically needed by regression frameworks. Then, in Sect. 5.2 we couple the above SVM–based approach with a bootstrap method, in order to possibly improve the quality of its performance. Finally, we apply both such developed methodologies to the long term forecast of Bitcoin price.

5.1 The algorithm MT–SVM

In order to combine MTs with SVMs, let $(\bar{x}_i,\bar{y}_i)$, $i=1, \ldots ,N$, be the pairs of SF and Bitcoin price, within a given time interval. Then, we preliminarily identify the two subsets ${L}_{\max }$ and ${L}_{\min }$ of $\{ (\bar{x}_i,\bar{y}_i) \}$, each associated with a different weak Pareto front. In particular, consider Fig. 3. We compute the sets $L_{\max }$ and $L_{\min }$, being now respectively $L_{\max }$ indicated by $L_{West-North}$ and $L_{\min }$ indicated by $L_{East-South}$, as

$L_{East-South}$: the weak Pareto front (red points) associated with both the maximization of the stock–to–flow SF and the minimization of Bitcoin price;
$L_{West-North}$: the weak Pareto front (cyan points) associated with both the minimization of the stock–to–flow SF and the maximization of Bitcoin price.

For a more formal definition of these last sets of points, the reader can consider that the point $(\bar{x}_i,\bar{y}_i)$ will be classified as a point in ${L}_{\max }$ if it satisfies the properties (non–dominated point with respect to maximization of $\bar{y}$ and minimization of $\bar{x}$)

$$\begin{aligned} \begin{array}{c} \left( \bar{x}_i, \bar{y}_i \right) \in L_{\max } \ \ \textrm{if} \ \ \not \exists j \in \{1, \ldots , N\}, \ \ \textrm{with} \ \ (\bar{x}_j, \bar{y}_j) \ne \left( \bar{x}_i, \bar{y}_i \right) , \\ \mathrm{s.t.} \ \ \ \bar{x}_j < \bar{x}_i \ \textrm{and} \ \bar{y}_j > \bar{y}_i. \end{array} \end{aligned}$$

Similarly, the point with coordinates $(\bar{x}_i,\bar{y}_i)$ will be classified as a point in ${L}_{\min }$ if it satisfies the properties (dominated point with respect to minimization of $\bar{y}$ and maximization of $\bar{x}$)

$$\begin{aligned} \begin{array}{c} \left( \bar{x}_i, \bar{y}_i \right) \in L_{\max } \ \ \textrm{if} \ \ \not \exists j \in \{1, \ldots , N\}, \ \ \textrm{with} \ \ (\bar{x}_j, \bar{y}_j) \ne \left( \bar{x}_i, \bar{y}_i \right) , \\ \mathrm{s.t.} \ \ \ \bar{x}_j > \bar{x}_i \ \textrm{and} \ \bar{y}_j < \bar{y}_i. \end{array} \end{aligned}$$

Broadly speaking, the front $L_{\max }$ includes (desirable) points with high price performance for Bitcoin vs. its SF. On the contrary, $L_{\min }$ contains (undesirable) points with poor price performance for Bitcoin vs. its SF. Therefore, the sets $L_{\max }$ and $L_{\min }$ include points which may be associated with extreme opposite performances of Bitcoin price, and can be used in our SVM–based classification framework to possibly model long term Bitcoin price.

In the remaining part of this section we propose an iterative procedure which relies on the following definition (all the theoretical results in the current section refer to the set $\{(\bar{x}_i, \bar{y}_i)\}$, with $\bar{x}_i \in \mathrm{{I\!R}}$ and $\bar{y}_i \in \mathrm{{I\!R}}$. Nevertheless, in Pontiggia and Fasano (2021) the authors proved that they can be immediately extended to the set $\{(\bar{x}_i, \bar{y}_i)\}$, with $\bar{x}_i \in \mathrm{{I\!R}}^p$, $p \ge 2$, $\bar{y}_i \in \mathrm{{I\!R}}$, too).

Definition 1

Given the points $(\bar{x}_i,\bar{y}_i) \in \mathrm{{I\!R}}^2$, $i=1, \ldots , N$, and the values $z_i \in \{-1,+1\}$, $i=1, \ldots , N$, let us define the nonempty sets $A = \{ (\bar{x}_i,\bar{y}_i) \, \ z_i = +1\}$ and $B = \{ (\bar{x}_i,\bar{y}_i) \, \ z_i = -1 \}$. Then, we say that A and B are linearly separable in $\mathrm{{I\!R}}^2$ if there exists a line $H(\beta ,\beta _0;x,y)=0$, with coefficients $\beta , \beta _0 \in \mathrm{{I\!R}}$, such that

$$\begin{aligned} \left\{ \begin{array}{lcl} H(\beta ,\beta _0;\bar{x}_i, \bar{y}_i) > 0, &{} \qquad &{} \forall i \ : \ z_i=+1 \\ &{} &{} \\ H(\beta ,\beta _0;\bar{x}_i, \bar{y}_i) < 0, &{} \qquad &{} \forall i \ : \ z_i=-1. \end{array} \right. \end{aligned}$$

(5.10)

The procedure in our proposal starts by setting $A_0 = {L}_{\max }$ and $B_0 = {L}_{\min }$; then, we generate the sequences of sets $\{A_k\}$ and $\{B_k\}$, with $k = 0,1,2, \ldots $, according with the next two distinct phases:

first, we solve an SVM classification problem that computes the line
$$\begin{aligned} H \left( \beta ^{(k)},\beta ^{(k)}_0;x,y \right) =0 \end{aligned}$$
which linearly separates (see Definition 1) the sets $A_{k}$ and $B_{k}$, and is maximally distant (maximum margin) from their points;
second, a point $\left( \bar{x}_{\max }^{(k)},\bar{y}_{\max }^{(k)}\right) $ in $\{(\bar{x}_i,\bar{y}_i), \ i=1, \ldots ,N \} \setminus \{A_k \cup B_k\}$ is identified and assigned with a label $z_k \in \{-1,+1\}$. In case $z_k = +1$ we generate the novel sets $A_{k+1} = A_{k} \cup \left\{ \left( \bar{x}_{\max }^{(k)},\bar{y}_{\max }^{(k)}\right) \right\} $ and $B_{k+1} = B_{k}$, otherwise we set $A_{k+1} = A_{k}$ and $B_{k+1} = B_{k} \cup \left\{ \left( \bar{x}_{\max }^{(k)},\bar{y}_{\max }^{(k)}\right) \right\} $.

More formally, our proposal is summarized in the Algorithm MT–SVM of Table 1, whose steps are briefly commented as follows.

Table 1 Description of our SVM–based procedure applied to the points $\{(\bar{x}_i,\bar{y}_i), \ i=1, \ldots ,N \}$ in the set $\chi $

Full size table

In a preliminary initialization we set $\chi = \{(\bar{x}_i,\bar{y}_i), \ i=1, \ldots ,N \}$ and set $A_0$ and $B_0$, using $L_{\max }$ and $L_{\min }$ defined above. Then, we compute the best (i.e. the one maximizing the margin) separating line $H \left( \beta ^{(0)},\beta _0^{(0)};x,y \right) =0$ between $A_0$ and $B_0$, whose parameters are given by $\beta ^{(0)}$ and $\beta _0^{(0)}$, using an SVM method. Thus, $H \left( \beta ^{(0)},\beta _0^{(0)};x,y \right) =0$ is the maximally distant line with respect to both the sets $A_0$ and $B_0$. In particular, in Algorithm MT–SVM we indicate with $SVM(A_k,B_k)$ the solution of the SVM problem which computes the best (maximally distant) separating line between $A_k$ and $B_k$. Note that the SVM problem reduces to a convex linearly constrained quadratic minimization problem which always admits solution (see Cristianini and Shawe-Taylor, 2000).

Therefore, at any step k we first pick the point $\left( \bar{x}_{\max }^{(k)}, \bar{y}_{\max }^{(k)}\right) $ in $\chi {\setminus } \{A_k \cup B_k\}$ with the largest distance from $H \left( \beta ^{(k)},\beta _0^{(k)};x,y \right) =0$. Moreover, depending on the half space where $\left( \bar{x}_{\max }^{(k)}, \bar{y}_{\max }^{(k)}\right) $ is located, with respect to the line $H \left( \beta ^{(k)},\beta _0^{(k)};x,y \right) =0$, we update the novel sets $A_{k+1}$, $B_{k+1}$ starting from the pair $A_k$, $B_k$. In the end, we increase the step and iterate the procedure. The next results can be proved, which establishes theoretical properties for Algorithm MT–SVM (see also Pontiggia and Fasano, 2021).

Lemma 1

Consider the set $\chi \subset \mathrm{{I\!R}}^2$, with $|\chi | < +\infty $. Let ${L}_{\max }, {L}_{\min } \subseteq \chi $. Then, the Algorithm MT–SVM in Table 1 provides the pair of sets $A_m$, $B_m$ after m steps, with

$$\begin{aligned} m= \left| \chi \setminus \{{L}_{\max } \cup {L}_{\min }\} \right| , \end{aligned}$$

such that

$$\begin{aligned} \left\{ \begin{array}{l} A_m \cup B_m = \chi \\ \ \\ A_m \cap B_m = \emptyset . \end{array} \right. \end{aligned}$$

Proof

By Table 1, recalling that $|\chi |$ is finite, the index k ranges from 0 to $\left| \chi {\setminus } \{ A_0 \cup B_0 \} \right| \le |\chi | < +\infty $. Moreover, since by construction $|A_k \cup B_k| = |A_{k-1} \cup B_{k-1}| + 1$, then m is exactly given by the number of points in $\chi $ which are neither present in ${L}_{\max }$ nor in ${L}_{\min }$. $\square $

Proposition 3

Let be given the nonempty sets $L_{\max }, L_{\min } \subseteq \chi $, and consider the Algorithm MT–SVM in Table 1. If $L_{\max }$ and $L_{\min }$ are linearly separable, as by Definition 1, then the sets $A_k$ and $B_k$ are linearly separable, for any $k \ge 0$.

Proof

The proof can be found in Pontiggia and Fasano (2021). $\square $

We can also give a couple of additional results, to better clarify the properties of the solution provided by Algorithm MT–SVM, either in case $L_{\max }$ and $L_{\min }$ are / are not linearly separable.

Lemma 2

Let be given the nonempty sets $L_{\max }, L_{\min } \subseteq \chi $, and consider the Algorithm MT–SVM in Table 1. If $L_{\max }$ and $L_{\min }$ are linearly separable, then

for any $k \ge 1$ the margin $W^{(k)}$ of the SVM problem $SVM(A_{k},B_{k})$ satisfies
$$\begin{aligned} W^{(k)} = \min \left\{ W^{(k-1)}, 2d_{\max }^{(k)} \right\} ; \end{aligned}$$
(5.11)
the sequence $\left\{ W^{(k)}\right\} $ is monotonically nonincreasing, with
$$\begin{aligned} W^{(k)} \le 2d_{\max }^{(j)}, \qquad j=0, \ldots , k. \end{aligned}$$
(5.12)

Moreover, assume that at step k of the Algorithm MT–SVM we set $d_{\max }^{(k)} \leftarrow \hat{d}$, being $\hat{d} = d[ (\hat{x}, \hat{y}), H(\beta ^{(k)},\beta _0^{(k)};x,y)]$ with

$$\begin{aligned} \displaystyle \hat{d} \not \in {\arg \max }_{(x_i,y_i) \in \chi \setminus \{A_{k} \cup B_{k}\} } \left\{ d[(x_i,y_i), H(\beta ^{(k)},\beta _0^{(k)};x,y)] \right\} . \end{aligned}$$

Then, we have $W^{(k)} \le 2 \hat{d}$.

Proof

The proof can be found in Pontiggia and Fasano (2021). $\square $

Observation 5.1

Let be given the nonempty sets $L_{\max }, L_{\min } \subseteq \chi $, and consider the Algorithm MT–SVM in Table 1. Then, it can be also proved (see Pontiggia and Fasano, 2021) that under mild assumptions the misclassified points when solving $SVM(A_{k+1},B_{k+1})$ are a subset of the misclassified points when solving $SVM(A_k,B_k)$. This remarks once more the importance of selecting in our procedure $A_0=L_{\max }$ and $B_0=L_{\min }$, being $L_{\max }$ and $L_{\min }$ likely separable in practice.

Figure 4 shows the overall outcome of Algorithm MT–SVM for $k=0$. In particular, we have $A_0 = {L}_{\max } = L_{West-North}$ and $B_0 = {L}_{\min }=L_{East-South}$, being $A_0$ and $B_0$ linearly separable as in Definition 1. The picture also reports three parallel lines: one central line and two side lines. The central line represents the separating line computed by $SVM (A_0,B_0)$, i.e. the line of equation $H \left( \beta ^{(0)},\beta _0^{(0)};x,y \right) =0$. Conversely, the two side lines delimit the largest region (stripe) where none of the points in $A_0\cup B_0$ is included. Finally, the circled points in the picture represent so called support vectors (see Cristianini and Shawe-Taylor, 2000), i.e. those points in $A_0\cup B_0$ which are the closest to the central line of equation $H \left( \beta ^{(0)},\beta _0^{(0)};x,y \right) =0$. Of course, applying the procedure in the Algorithm MT–SVM also for $k \ge 1$, the stripe delimited by the two side lines will reduce its thickness, so that the three lines tend to become closer and closer as k increases. Their slope identifies a possible trend–line (which is expected to change with k) for Bitcoin price vs. its SF. As by Figs. 3 and 4 the common slope of the three lines when $k=0$ is $m^* \approx 2.9324$, which is not pretty close to the value 2.680 obtained with our first proposal in Sect. 4. In particular, similarly to (4.9) we have now the line

$$\begin{aligned} \ln (price) = 2.9324 \ln (SF) -0.35052, \end{aligned}$$

so that considering for instance the value of the SF in (4.8) (i.e. at the end of September, 2021), we obtain

$$\begin{aligned} price = SF^{2.9324} \cdot e^{-0.35052} = 44.9^{2.9324} \cdot e^{-0.35052} \approx 49,297 \$. \end{aligned}$$

Observe that this last Bitcoin price forecast is appreciably different with respect to the one obtained using (4.9). Moreover, observing historical data of Bitcoin prices we can immediately realize that $40,171 \$ $ was a bit closer to the actual value (i.e. $\approx 41,500$ $) we experienced in practice, with respect to the value $49,297 \$ $ obtained with our second approach. Figure 4 includes the same information of Fig. 3, and additionally reports also the (remaining) points of the set $\{(\bar{x}_i,\bar{y}_i), \ i=1, \ldots ,N \}$ which are not in $L_{\max } \cup L_{\min }$.

Observing the parallel lines in Figs. 3 and 4, our SVM–based approach suggests also that the two side lines (i.e. the support hyperplanes following the taxonomy of SVMs) have respectively the equations

$$\begin{aligned} \left\{ \begin{array}{l} \ln (price) = 2.9324 \ln (SF) + (-0.35052 + 0.11629) = 2.9324 \ln (SF) - 0.23423, \\ \ \\ \ln (price) = 2.9324 \ln (SF) + (-0.35052 - 0.11629) = 2.9324 \ln (SF) - 0.46681, \end{array} \right. \end{aligned}$$

so that considering again the value of the SF in (4.8), we obtain for Bitcoin price forecast at the end of September 2021 the range of possible values

$$\begin{aligned} price \in \left[ 44.9^{2.9324} e^{- 0.46681} \ , \ 44.9^{2.9324} e^{- 0.23423} \right] \equiv [ \ 43,884.8 \$ \ , \ 55,376.1 \$ \ ]. \end{aligned}$$

(5.13)

This last result reveals that the actual price of 41,500 $ for Bitcoin, at the end of September 2021, was not included in the interval indicated by (5.13), though the relative error $\epsilon _{SVM} = [ 43,884.8 - 41,500]/41,500$, using our SVM–based approach (lowest extreme in the interval in (5.13)), and the relative error $\epsilon _{LS} = [40,171 - 41,500 ]/41,500$, using a more standard regression approach, are not distant. This indeed explains why we considered to report our numerical experience with reference to the end of September 2021: it was indeed the case in which our second proposal performs most poorly with respect to the regression analysis. A couple of final considerations should also be highlighted:

our second approach in the current section does not require any of the assumptions typically needed when solving linear least squares problems. This implies that our second approach is not subject to any specific validation test;
as any approach based on linear regression, our first proposal is capable of producing both a point relation between the Bitcoin price and its SF (see (4.9)) and a confidence interval for the price itself. Nevertheless, the calculation of this interval is easy only under suitable theoretical assumptions on the probability distribution of the data (e.g. the normal one which is typically associated with regression formulations). Conversely, our second approach always gives an interval of reliability for the pair of Bitcoin price vs. its SF, identified by the stripe delimited by the side lines in Fig. 3, regardless of the data probability distribution. Thus, in this regard our second proposal shows to some extent higher versatility and possibly robustness of the outcomes. Moreover, it provides a tentative information about long term Bitcoin price, that possibly expert investors may decide to refine or even integrate with their own trusted methods.

5.2 Enhancement using bootstrap

Bootstrap represents a widely used technique to infer statistics on a population, by performing re–sampling with replacement of the original dataset associated with the population. More often, after defining a reference measure, bootstrap first implies re–sampling the original dataset so that this measure is recomputed several times. Then, exploiting the Central Limit Theorem (CLT), this measure is treated as a random variable whose simple statistics (i.e. the mean value and the standard deviation) are sought. In this regard we recall that according with the CLT, when i.i.d. random variables are summed up (or averaged), then their properly normalized sum approaches a normal distribution, regardless of the original distribution of the random variables (see Davison and Hinkley, 1997).

Bootstrap can be declined in several practical ways, though it basically reduces to repeatedly selecting a sample in a given population, calculating the statistics associated with some measure, and finally taking the average of the computed statistics. To be more precise, a general bootstrap technique can be summarized as by the following scheme:

1
Select the number NS of re–samplings and the sample size SS to perform;
2
For any $i=1$ to NS Compute the i–th sample (of size SS) with replacement of the population Compute $k \ge 1$ quantities related to the i–th sample
3
Compute the statistics associated with the k quantities.

Note that such a bootstrap technique is in some sense basic as it does not account for possible presence of serial dependence in the considered time series (in that case, so-called moving blocks bootstrap methods should be used). Anyway, the choice to use a plain bootstrap technique is deliberate in order to stress the robustness of our Algorithm MT–SVM with respect to specific features of the time series data.

When applying a ML model, whose outcomes yield random quantities, then it is desirable that the results are provided with confidence intervals. We highlight that widely used techniques within ML, like cross-validation, are unable to immediately give confidence intervals for the quantities they report. In our SVM–based ML procedure we used bootstrap to estimate the final mean value and the standard deviation of Bitcoin price, with reference to the following 13 different time–windows for the pairs Bitcoin price vs. its SF:

Time–window 1	January 1st, 2011 – March 31st, 2021
Time–window 2	January 1st, 2011 – April 30th, 2021
Time–window 3	January 1st, 2011 – May 31st, 2021
Time–window 4	January 1st, 2011 – June 30th, 2021
Time–window 5	January 1st, 2011 – July 31st, 2021
Time–window 6	January 1st, 2011 – August 31st, 2021
Time–window 7	January 1st, 2011 – September 30th, 2021
Time–window 8	January 1st, 2011 – October 31st, 2021
Time–window 9	January 1st, 2011 – November 30th, 2021
Time–window 10	January 1st, 2011 – December 31st, 2021
Time–window 11	January 1st, 2011 – January 31st, 2022
Time–window 12	January 1st, 2011 – February 28th, 2022
Time–window 13	January 1st, 2011 – March 31st, 2022

Table 2 Forecasts of the value of the logarithm for Bitcoin price, along with some related statistics, considering 1 month as prediction horizon

Full size table

The dataset associated with each time–window is used to apply the Algorithm MT –SVM, in order to generate the parameters of the central line in Fig. 3. Then, the forecast of Bitcoin price is computed in the subsequent three months (e.g. for Time–window 1 we used the data in the interval January 1st, 2011 – March 31st, 2021, and computed a forecast for Bitcoin price at the end of April 2021, May 2021 and June 2021). Finally, this last scheme is repeated for NS re–samplings of data in the same time–window, so that a statistics using bootstrap will be available. Following standard guidelines from the literature, we also set in our bootstrap framework: number of re–samplings $NS = 250$; size of each sample $SS = \mathrm {the \ size \ of \ dataset \ in \ the \ current \ time}$–$\textrm{window}$.

In Tables 2, 3 and 4 we present the 13 forecasts achieved respectively for each of the considered prediction horizons, i.e. after 1 month (Table 2), after 2 months (Table 3), and after 3 months (Table 4). In particular, in each table: Column 1 reports the true value of Bitcoin price; Columns 2 gives the value of the forecast obtained by the Algorithm MT–SVM; Column 3 and Column 4 respectively provide the average value and the standard deviation of the forecast obtained by the Algorithm MT–SVM coupled with the bootstrap method; Column 5 shows the 95% confidence interval computed using the average value and standard deviation in Column 3 and Column 4; Columns 6 and 7 present the relative errors respectively associated to the forecasts from the Algorithm MT–SVM and from the Algorithm MT–SVM coupled with the bootstrap method.

Table 3 Forecasts of the value of the logarithm for Bitcoin price, along with some related statistics, considering 2 months as prediction horizon

Full size table

Table 4 Forecasts of the value of the logarithm for Bitcoin price, along with some related statistics, considering 3 months as prediction horizon

Full size table

Observation 5.2

We highlight that for each forecast horizon, most of the true values to predict fall within the confidence interval, namely 10 out of 13 for the one-month horizon, 10 out of 13 for the two-month horizon, and 13 out of 13 for the three-month horizon. Furthermore, the other actual values not falling within this interval are generally close to it. Given the high volatility that the Bitcoin price has often experienced, our outcomes can be considered more than satisfactory. Lastly note that, unlike what usually happens when forecasting time series data, the above three ratios 10/13, 10/13 and 11/13 do not decrease as the forecast horizon length increases.

Observation 5.3

We stress that for all the forecast horizons, the average of the relative error associated to the Algorithm MT–SVM coupled with the bootstrap method (i.e. the average of the values in the last column of Tables 2, 3 and 4) is lower than the average of the relative error associated to the Algorithm MT–SVM alone (i.e. the average of the values in the second last column of Tables 2, 3 and 4), namely: 0.28% vs. 2.31% for the one-month horizon, 0.72% vs. 2.71% for the two-month horizon, and 0.56% vs. 2.54% for the three-month horizon. We also point out that the standard deviations of both the relative errors are very close between them in each of the three considered forecast horizons.

All these results give numerical evidence that in this context coupling a bootstrap method with a SVM–based approach can improve the quality of the forecast.

Lastly, as stated in the end of Sect. 4, we consider the linear regression model for comparative purposes with our SVM-based approach enhanced through bootstrap. In particular, similarly to what done in the previous analyses, we applied the linear regression model to predict the future price of Bitcoin with reference to the above same 13 time-windows and for each of the above same forecast horizons.

All the main findings strongly suggest that the Algorithm MT–SVM coupled with the bootstrap method performs better than the linear regression model. Figure 5 shows that all the 95% confidence intervals related to the former approach (graphically represented in red) are strictly contained in the corresponding 95% confidence intervals related to the linear regression model (graphically represented in black), this for all the considered time-windows and all the forecast horizons.

Observation 5.4

We highlight that for all the forecast horizons, the average of the standard error associated to the Algorithm MT–SVM coupled with the bootstrap method is lower than the average of the standard error associated to the linear regression model, namely: 0.2959 vs. 0.9935 for the one-month horizon, 0.2941 vs. 0.9935 for the two-month horizon, and 0.2926 vs. 0.9935 for the three-month horizon.

Observation 5.5

We point out that for all the forecast horizons, the average of the relative error associated to the Algorithm MT–SVM enhanced using the bootstrap method is lower (absolute value) than the average of the relative error associated to the linear regression model, namely: 0.28% vs. −2.49% for the one-month horizon, 0.72% vs. −2.00% for the two-month horizon, and 0.56% vs. −2.12% for the three-month horizon. We also stress that the standard deviations of both the relative errors are very close between them in each of the three considered forecast horizons.

These findings provide numerical evidences that, at least in this forecasting context, coupling a bootstrap method with a SVM–based approach can improve the quality of the performance with respect to the one obtained from the use of a linear regression model.

6 Conclusions and future work

This paper contributes to possibly investigate reliable long term models for Bitcoin price forecast. We recall that in the last decade Bitcoin has become an observed digital asset, for possible investments by both private and institutional stakeholders. In this paper we have specifically proposed a couple of models, following two different perspectives. The first one was suggested by considering a more standard regression analysis, while the second one is definitely novel in the literature, being obtained by combining a preliminary multiobjective approach with a ML scheme, where a sequence of SVMs is indeed considered.

We are persuaded that several factors strongly contribute to affect long term Bitcoin price, other than the SF. Nevertheless, the dependency of Bitcoin price on its SF was suggested by several authors, and is also considered in the first proposal of this paper. In this regard, a natural future extension for our analysis will have to consider a set of multiple elements encompassing SF, so that our ML–based proposal will have to be enhanced.

Notes

Note that reliable data in the early years of Bitcoin history may be hardly retrieved, because in 2009–2010 there were not yet observers in charge for accurate data collection. Hence, we decided to completely revise and update our database including more recent data, but also discarding the pairs corresponding to the years 2009–2010. In the attempt to collect more reliable data we downloaded and compared it from the websites: https://www.blockchain.com/charts/total-bitcoins https://www.cryptocurrencychart.com/, https://datahub.io/cryptocurrency/bitcoin, https://www.investing.com/crypto/bitcoin/historical-data, https://finance.yahoo.com/cryptocurrencies.
In particular, see the webpage www.buybitcoinworldwide.com/stats/stock-to-flow/.
Note that 463 days is about 95% of $4 /3$ years..
For further information on data for Bitcoin prices and the number of minted bitcoins, the reader may refer to the footnote at page 2.

References

Aggarwal, D., Chandrasekaran, S., & Annamalai, B. (2020). A complete empirical ensemble mode decomposition and support vector machine-based approach to predict Bitcoin prices. Journal of Behavioral and Experimental Finance, 27, 100335.
Article Google Scholar
Baur, D. G., & Dimpfl, T. (2021). The volatility of Bitcoin and its role as a medium of exchange and a store of value. Empirical Economics, 61, 2663–2683.
Article Google Scholar
Bernstein, D. S. (2009). Matrix mathematics: Theory, facts, and formulas (2nd ed.). Princeton University Press.
Book Google Scholar
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society Series B (Methodological), 26(2), 211–252.
Article Google Scholar
Buy Bitcoin Worlwide (2019). Bitcoin stock to flow model live chart, https://www.buybitcoinworldwide.com
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.
Book Google Scholar
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press.
Book Google Scholar
Deng, N., Tian, Y., & Zhang, C. (2013). Support vector machines - optimization based theory, algorithms, and extensions. Chapman and Hall or CRC.
Google Scholar
Graybill, F. A., & Iyer, H. K. (1994). Regression analysis: Concepts and applications. Duxbury Press.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning (2nd ed.). Springer.
Google Scholar
https://www.blockchain.com/charts/total-bitcoins
Jarque, C. M., & Bera, A. K. (1987). A test for normality of observations and regression residuals. International Statistical Review, 55(2), 163–172.
Article Google Scholar
Marsland, S. (2015). Machine learning: An algorithmic perspective (2nd ed.). CRC Press Taylor & Francis Group.
Google Scholar
Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system, http://www.bitcoin.org/bitcoin.pdf
Nakamoto, S., et al. [Anonymous] (2014). Bitcoin source code – amount constraints, https://github.com/bitcoin/bitcoin
PlanB (2019). Modeling Bitcoin’s Value with Scarcity, https://medium.com/@100trillionUSD/modeling-bitcoins-value-with-scarcity-91fa0fc03e25
Pontiggia, A., & Fasano, G. (2021). Data analytics and machine learning paradigm to gauge performances combining classification, ranking and sorting for system analysis, Working Paper 05/2021, Department of Management, University Ca’ Foscari of Venice.
Sreekanth Reddy, L., & Sriramya, P. (2020). A research on Bitcoin price prediction using machine learning algorithms. International Journal of Scientific & Technology Research, 9(4), 1600–1604.
Google Scholar
Various Authors (2011). Total circulating Bitcoin, https://www.blockchain.com/charts/total-bitcoins
Vapnik, V. (1995). The nature of the statistical learning theory. Springer.
Book Google Scholar
Vapnik, V. (1998). Statistical learning theory. Wiley.
Google Scholar
Vigna, P., & Casey, M.J. (2015). The age of cryptocurrency: How bitcoin and digital money are challenging the global economic order (1st ed.). St. Martin’s Press.

Download references

Acknowledgements

Marco Corazza and Giovanni Fasano wish to thank Istituto Nazionale di Alta Matematica (IN$\delta $AM), Giovanni Fasano wishes to thank Consiglio Nazionale delle Ricerche – Istituto di Ingegneria del Mare (CNR–INM), for the support they received. Giovanni Fasano also thanks his three lawyer friends Maria, Maristella and Massimo, whose valuable perspective considerably contributed to inspire the contents of this paper.

Author information

Authors and Affiliations

Department of Computer, Control, and Management Engineering ‘A.Ruberti’, University of Rome “La Sapienza”, via Ariosto 25, 00185, Rome, Italy
Andrea Caliciotti
Enel Green Power S.p.A., Viale Regina Margherita 125, 00198, Rome, Italy
Andrea Caliciotti
Department of Economics, Ca’ Foscari University of Venice, Cannaregio 873, 20121, Venice, Italy
Marco Corazza
Department of Management, Ca’ Foscari University of Venice, Cannaregio 873, 20121, Venice, Italy
Giovanni Fasano

Authors

Andrea Caliciotti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Corazza
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Fasano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Corazza.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Caliciotti, A., Corazza, M. & Fasano, G. From regression models to machine learning approaches for long term Bitcoin price forecast. Ann Oper Res 336, 359–381 (2024). https://doi.org/10.1007/s10479-023-05444-w

Download citation

Accepted: 18 May 2023
Published: 03 July 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10479-023-05444-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

From regression models to machine learning approaches for long term Bitcoin price forecast

Abstract

Similar content being viewed by others

Bitcoin Price Forecasting Through Crypto Market Variables: Quantile Regression and Machine Learning Approaches

Bitcoin Price Prediction Using Time Series Analysis and Machine Learning Techniques

Modeling Bitcoin Prices using Signal Processing Methods, Bayesian Optimization, and Deep Neural Networks

1 Introduction

2 Regression problems and least squares optimization

3 Our setting for data scaling in Least Squares problems

Proposition 1

Proof

Proposition 2

Proof

Observation 3.1

Observation 3.2

Corollary 1

Proof

Corollary 2

Proof

Corollary 3

Proof

Observation 3.3

4 The assessment of the SF for Bitcoin

Observation 4.1

5 Our second approach for long term Bitcoin price prediction

5.1 The algorithm MT–SVM

Definition 1

Lemma 1

Proof

Proposition 3

Proof

Lemma 2

Proof

Observation 5.1

5.2 Enhancement using bootstrap

Observation 5.2

Observation 5.3

Observation 5.4

Observation 5.5

6 Conclusions and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation