1 Introduction

Bitcoin, a pioneer in the realm of cryptocurrencies, was first conceptualized by Nakamoto (2008) and came into practical use in January 2009. Within its ecosystem, each transaction is transparently recorded in a ledger maintained by a decentralized network of nodes, eliminating the need for intermediaries such as central banks or governing institutions. Originally developed as a digital currency intended to replace traditional money, Bitcoin is now predominantly recognized and utilized as an innovative investment asset, as indicated by Baur et al. (2018). This recognition is further substantiated by several studies arguing that incorporating Bitcoin into traditional asset portfolios could enhance overall performance and offer diversification benefits. For instance, Platanakis and Urquhart (2020) conclude that including Bitcoin in a traditional portfolio of stocks and bonds can significantly increase risk-adjusted returns, while Akhtaruzzaman et al. (2020) demonstrate Bitcoin's capability to function as a hedge against risks in industrial portfolios and bonds. As of today, according to CoinMarketCap,Footnote 1 Bitcoin's market capitalization has soared to $723 billion, with daily trading volumes exceeding $14 billion, demonstrating its firm establishment as a formidable financial asset in the global market. Based on data from CompaniesMarketCap,Footnote 2 this market capitalization is comparable to the world's 10th largest corporate value in the stock market, ranking alongside companies like Meta Platforms, Berkshire Hathaway, and Tesla.

Despite its astonishing market capitalization, studies on the informational efficiency of the Bitcoin market suggest that it remains less mature compared to traditional stock markets. For instance, Vidal-Tomás and Ibañez (2018) and Sensoy (2019) argue that the informational efficiency of the Bitcoin market has been increasing over time, but Al-Yahyaee et al. (2018) contend that the Bitcoin market is inefficient when compared to gold, stock, and currency markets. On another note, a study by Ante (2023) tracking Elon Musk's statements found that his posts could increase Bitcoin's price by up to 16.9%. Yet, in the stock market, it is unlikely for an individual's remarks to have such a significant impact on a brand with such a high market capitalization as Bitcoin.

The emerging Bitcoin market exhibits several key differences from traditional stock markets. Firstly, as shown by Chaim and Laurini (2018) and Baur and Dimpfl (2021), the Bitcoin market is characterized by high price volatility, representing a significant risk for investors, yet also offering the potential for high returns. Secondly, the Bitcoin market operates 24/7, enabling investors worldwide to engage in trading at any time, unlike traditional stock markets which are limited to specific trading hours in certain regions. Thirdly, as Vidal-Tomás and Ibañez (2018) emphasized, Bitcoin is underpinned by decentralized ledger technology, which doesn't require central institutional management or intervention, making it less susceptible to the influences of central banks and governments compared to traditional markets. Lastly, the Bitcoin market, due to its novelty and innovative nature, attracts unique market participants and investment behaviors not typically seen in traditional stock markets. In this context, Kim et al. (2020), argue that individual investors play an extremely important role in the Bitcoin market, a setting where information asymmetry is more likely to occur. Considering these factors, if individual investors play a significant role in the nascent and immature Bitcoin market, it is plausible that many decisions are made based on statements by celebrities, hence positing the hypothesis that sentiments on social media could be a potent predictor of the Bitcoin market. Our study aims to generalize this hypothesis, not by focusing solely on well-known figures like Elon Musk, but by utilizing statements from multiple celebrities.

In the field of sentiment analysis, there has been a growing trend in recent years to utilize this approach in the stock market, particularly in conjunction with 'alternative data' that differs from traditional data such as financial reports and economic indicators. For instance, Baker and Wurgler (2006), empirically demonstrated that investor sentiment, measured using a top-down approach, significantly impacts individual companies and the stock market as a whole. Tetlock (2007), showed that pessimistic columns in the Wall Street Journal (WSJ) could predict downward pressure on the Dow Jones, followed by a market price return to fundamentals. The advancement of natural language processing (NLP) tools, especially those of machine learning, has enabled the extraction of diverse features from text. For example, Chen et al. (2022) elucidated the interplay between market trends and pandemic narratives by extracting various features from financial news, such as narrative intensity, textual sentiments and tones, and virality. Since the late 2000s, the rapid proliferation of social media platforms as tools for information sharing, communication, and collaboration has led to a surge in sentiment analysis using data from social networking services (SNS). Bollen et al. (2011), suggested the possibility that the collective mood reflected on X (formerly Twitter)Footnote 3 could impact the stock market. Zhang et al. (2011), discovered a significant negative correlation between the proportion of emotional posts and major indices like the Dow Jones, NASDAQ, and S&P500. However, when it comes to the emerging Bitcoin market, while there are some available studies such as Mai et al. (2018) and Baig et al. (2019), detailed analyses addressing the effects of various sentiments extracted from SNS posts are still insufficient. This paper endeavors to bridge this gap by conducting a comprehensive time series analysis of Bitcoin, focusing on how varying sentiments expressed on social media platforms influence its market dynamics and price fluctuations.

This research employs the Vector Autoregression (VAR) model to analyze the dynamic interplay between Bitcoin prices and the sentiments expressed by celebrities. Initially introduced by Sims (1980) as a tool for empirical analysis of macroeconomic dynamics, the VAR model is founded on the philosophy of 'letting the data speak for themselves'. This approach allows us to understand the magnitude and direction of influence and interactions without pre-assuming any specific trends, purely based on the data. The VAR model is particularly adept at capturing the dynamic relationships among multiple time series data using the lagged values (lags) of each variable. This makes it suitable for modeling the potential mutual influences between Bitcoin prices and sentiment over time. Furthermore, the VAR model is capable of handling the endogenous relationships between variables, effectively analyzing internal dynamics such as how Bitcoin prices may influence sentiment, which in turn, feeds back into the prices. Our research also leverages tools like the Granger causality test, which helps in verifying whether one time series significantly influences another, and the impulse response function, which tracks over time how a shock in one variable impacts others, thereby providing a comprehensive analysis of the relationship between Bitcoin prices and sentiment.

In this study, we conduct a detailed analysis using the VAR model to examine the impact of sentiment data from celebrities on X, specifically positive and negative sentiments, on Bitcoin price fluctuations. The research aims to elucidate the unique characteristics of the emerging Bitcoin market, providing insightful information for market participants and researchers. This paper is structured as follows: in Sect. 2, we describe data and variables, while Sect. 3 elucidates the relevant methodology. We present results in Sect. 4 and conclude the paper with the final remarks in Sect. 5.

2 Data and variables

This section describes the data and variables. Following the methodology of Tetlock (2007), which involved including lagged volume as a variable in regression to capture the liquidity effect, this study utilizes data on Bitcoin volume. Specifically, the data encompasses Bitcoin prices and trading volumes, as well as X posts by celebrities in the Bitcoin community. It should be noted that the X posts were collected using the official API of the “Academic Research product track,” a program designed for researchers. However, due to the discontinuation of API access in 2023, the collection of posts was limited to a span of 36 months, from July 1, 2019, to June 30, 2022. The gathered data were subjected to initial processing, resulting in the preparation of seven types of daily time-series data.

2.1 Log return on Bitcoin prices (Pt)

Bitcoin can be traded 24/7 on many exchanges. Daily historical data was obtained using the API provided by the U.S. company Coinbase.Footnote 4 This historical data includes the opening, closing, high, low prices, and the trading volume of Bitcoin in U.S. dollars. From this data, the closing price at time step t was used as the daily Bitcoin price, denoted as closet, and the logarithmic return on Bitcoin price, Pt, was defined as follows:

$${P}_{t}={\text{ln}}\left(\frac{clos{e}_{t}}{clos{e}_{t-1}}\right)$$
(1)

As Banerjee et al. (1993) have noted, using logarithmic returns resolves the issue of non-stationarity, enabling the application of statistical methods that assume stationarity. Furthermore, logarithmic returns are helpful in transforming non-stationary data into stationary data, which is essential for econometric analysis. The dataset covers the period from July 1, 2019, to June 30, 2022. Figures 1 and 2 illustrate the trends of closet and Pt, respectively. During this period, the percentage of days when the price exceeded the previous day was 51.37%, and the days it was lower were 48.54%. Other characteristics are presented in Table 1.

Fig. 1
figure 1

The trend of closet: Bitcoin prices (01/07/2019–30/06/2022)

Fig. 2
figure 2

The trend of Pt: Log return on Bitcoin prices (01/07/2019–30/06/2022)

Table 1 Descriptive statistics for variables closet and Pt

2.2 Log return on Bitcoin volumes (Vt)

The acquired historical data encompass the daily trading volumes from July 1, 2019, to June 30, 2022. These volumes are defined as daily Bitcoin volumes volumet at time step t and are illustrated in Fig. 3. During the period, it was observed that 47.35% of the volumes exceeded the previous day's volumes, while 52.55% fell below.

Fig. 3
figure 3

The trend of volumet: Bitcoin volumes (01/07/2019–30/06/2022)

As evident from the figure, multiple instances of data points significantly deviating from the average were observed in the volume data. To enhance the stability of the model, outlier detection and preprocessing were conducted as part of the statistical treatment. Specifically, data points with an absolute Z-score exceeding 3 were identified as outliers and replaced with values corresponding to a Z-score of 3. This preprocessing resulted in the replacement of 10 data points (0.84% of the total). The preprocessed daily volumes at time step t are defined as adjusted volumet. The log returns, denoted as Vt, were defined as follows.

$${V}_{t}={\text{ln}}\left(\frac{adjusted \, volum{e}_{t}}{adjusted \, volum{e}_{t-1}}\right)$$
(2)

The trend of Vt is presented in Fig. 4. The characteristics of both volumet and Vt are summarized in Table 2.

Fig. 4
figure 4

The trend of Vt: Log return on Bitcoin volumes (01/07/2019–30/06/2022)

Table 2 Descriptive statistics for variables volumet and Vt

2.3 Positive/Negative post rate by Bitcoin celebrities (S pos t/S neg t)

For this study, we selected the X accounts of 30 celebrities based on rankings from the Bitcoin community by Hive.one,Footnote 5 as displayed in Table 3. Hive.one ranks individuals primarily based on their X follower graph, facilitating the discovery of reputable accounts within specific communities. All posts from these selected accounts, from July 1, 2019, to June 30, 2022, were collected using the official X API.

Table 3 Selected X accounts of 30 celebrities from the Bitcoin community

For the analysis of positive and negative sentiments, this study employed the VADER (Valence Aware Dictionary and sEntiment Reasoner) model, developed by Hutto and Gilbert (2014). VADER is a rule-based model designed for the sentiment analysis of social media text and constructs a gold-standard sentiment lexicon with over 7,500 lexical features, including words, emoticons, slang, and acronyms. The VADER lexicon is particularly attuned to the context of microblogging platforms such as X. Therefore, VADER has been used in various studies in the field of data science using social media sentiment, including Valdez et al. (2020) and Bouktif et al. (2020). The compound score, a key metric used in VADER for social media text sentiment analysis, is calculated by normalizing the sum of valence scores of each token within the text, as shown in the following formula:

$$compound \, score=\frac{{\sum }valence \, score \, of \, token}{\sqrt{{\left({\sum }valence \, score \, of \, token\right)}^{2}+\alpha }}$$
(3)

In this formula, α represents a constant for normalization. The compound score indicates the overall sentiment strength of a given text, ranging from -1 (most extreme negative) to + 1 (most extreme positive). In this study, we employed the following thresholds as described in the documentation of vaderSentiment,Footnote 6 a Python implementation of VADER:

  • Positive sentiment: compound score >  = 0.05

  • Neutral sentiment: compound score > – 0.05 and < 0.05

  • Negative sentiment: compound score <  = – 0.05

Sentiment analysis was performed on the collected posts from the 30 celebrities, categorizing them into positive, neutral, and negative based on the above thresholds. Daily counts of positive (post), negative (negt), and neutral (neut) posts were aggregated. Time series Spost and Snegt were then calculated using the following formula:

$${S}_{t}^{pos}=\frac{po{s}_{t}}{po{s}_{t}+ne{g}_{t}+ne{u}_{t}}$$
(4)
$${S}_{t}^{neg}=\frac{ne{g}_{t}}{po{s}_{t}+ne{g}_{t}+ne{u}_{t}}$$
(5)

The trends for Spost and Snegt are presented in Figs. 5 and 6, respectively. Descriptive statistics for post, negt, neut are shown in Table 4, and for Spost and Snegt in Table 5.

Fig. 5
figure 5

The trend of Spost: Positive post rate by Bitcoin celebrities (01/07/2019–30/06/2022)

Fig. 6
figure 6

The trend of Snegt: Negative post rate by Bitcoin celebrities (01/07/2019–30/06/2022)

Table 4 Descriptive statistics for variables post, negt, and neut
Table 5 Descriptive statistics for variables Spost and Snegt

2.4 Valence/Arousal/Dominance post rate by Bitcoin celebrities (S val t/S aro t/Sdom t)

In the field of psychological research, emotions are traditionally conceptualized through three fundamental elements: valence (the pleasantness), arousal (the intensity of emotion), and dominance (the degree of control). Albert Mehrabian initially introduced the concept of a three-dimensional approach to emotions in his work in the 1960s, focusing on the dimensions of pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. This framework was further developed and refined in collaboration with James A. Russell. Their collaborative work in the 1970s, such as Mehrabian and Russell (1974), laid the foundation for the VAD (Valence, Arousal, and Dominance) model.

To conduct a multifaceted analysis that goes beyond the positive–negative dimension, the three-dimensional aspects of VAD in psychology are utilized as variables to examine the dynamic relationship with Bitcoin prices and volumes. Sentiment analysis for valence, arousal, and dominance employs the NRC VAD Lexicon and the Python package EmotionDynamics,Footnote 7 summing the scores for each word to determine the overall average score for the text. The NRC VAD Lexicon, developed by Mohammad (2018), contains over 20,000 common English words, each scored for valence (ranging from 0: extremely unpleasant to 1: extremely pleasant), arousal (from 0: extremely sleepy/sluggish to 1: extremely activated/excited), and dominance (from 0: extremely powerful to 1: extremely weak). A sentiment is considered active if the scores calculated using the NRC VAD Lexicon exceed one standard deviation above the mean. We aggregate daily counts of active valence posts (valt), arousal posts (arot), and dominance posts (domt), and defines the total daily posts as all postt. Time series Svalt, Sarot, Sdomt are then established as follows:

$${S}_{t}^{val}=\frac{{val}_{t}}{{all \, post}_{t}}$$
(6)
$${S}_{t}^{aro}=\frac{{aro}_{t}}{{all \, post}_{t}}$$
(7)
$${S}_{t}^{dom}=\frac{{dom}_{t}}{{all \, post}_{t}}$$
(8)

The trends of Svalt, Sarot, and Sdomt are presented in Figs. 7, 8, and 9, respectively. The descriptive statistics for valt, arot, and domt are provided in Table 6, while the descriptive statistics for Svalt, Sarot, and Sdomt are displayed in Table 7.

Fig. 7
figure 7

The trend of Svalt: Valence post rate by Bitcoin celebrities (01/07/2019–30/06/2022)

Fig. 8
figure 8

The trend of Sarot: Arousal post rate by Bitcoin celebrities (01/07/2019–30/06/2022)

Fig. 9
figure 9

The trend of Sdomt: Dominance post rate by Bitcoin celebrities (01/07/2019–30/06/2022)

Table 6 Descriptive statistics for variables valt, arot, and domt
Table 7 Descriptive statistics for variables Svalt, Sarot, and Sdomt

3 Analysis methods

In the analysis of this study, a three-variable Vector Autoregression (VAR) model is employed to investigate the potential interactions among selected variables. Specifically, to examine the impact of positive sentiment associated with celebrities, the variables Pt, Vt, and Spost are utilized. Similarly, to explore the influence of negative sentiment, the variables Pt, Vt, and Snegt are used. Additionally, to assess the effects of VAD sentiment, three sets of variables are employed: Pt, Vt, and Svalt; Pt, Vt, and Sarot; and Pt, Vt, and Sdomt.

3.1 Data validation

Before estimating the VAR model, the stationarity of all time series data was tested using the Augmented Dickey–Fuller (ADF) test to avoid the possibility of spurious regression. Time series with unit roots are considered non-stationary and exhibit behaviors akin to a Random Walk, where past values significantly influence future values. Due to this characteristic, series with unit roots are challenging to predict. The importance of testing for stationarity, especially in the context of macroeconomic data, is emphasized by Mushtaq (2011). The “Dickey” and “Fuller” in the ADF test are referenced from the work of Dickey and Fuller (1979). They detailed the properties of time series with unit roots in autoregressive models and the distribution of associated statistics, offering a statistical criterion to ascertain whether a time series has a unit root. The ADF test extends this approach, incorporating a broader range of lags for a more comprehensive analysis.

3.2 VAR model

The general equation of a VAR model with p lags can be represented as follows:

$${Y}_{t}=C+{A}_{1}{Y}_{t-1}+{A}_{2}{Y}_{t-2}+\cdots {A}_{p}{Y}_{t-p}+{u}_{t}$$
(9)

Here, Yt is a vector of k endogenous variables at time t. Each Ai (where i = 1,2,…,p) is a k × k coefficient matrix. These matrices capture the influence that past values of the vector Y have on its current value. The term C is a k × 1 vector of constants (intercepts). The ut represents the error terms at time t, which are assumed to be white noise with a zero mean and a constant covariance matrix. The first model of this study, a three-variable VAR model of Pt, Vt, and Spost, is expressed as:

$${P}_{t}={C}_{1}+{\sum }_{i=1}^{p}{\alpha }_{1i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{1i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{1i} {S}_{t-i}^{pos}+{u}_{1t}$$
(10)
$${V}_{t}={C}_{2}+{\sum }_{i=1}^{p}{\alpha }_{2i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{2i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{2i} {S}_{t-i}^{pos}+{u}_{2t}$$
(11)
$${S}_{t}^{pos}={C}_{3}+{\sum }_{i=1}^{p}{\alpha }_{3i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{3i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{3i} {S}_{t-i}^{pos}+{u}_{3t}$$
(12)

The second model, a three-variable VAR model of Pt, Vt, and Snegt, is as follows:

$${P}_{t}={C}_{4}+{\sum }_{i=1}^{p}{\alpha }_{4i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{4i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{4i} {S}_{t-i}^{neg}+{u}_{4t}$$
(13)
$${V}_{t}={C}_{5}+{\sum }_{i=1}^{p}{\alpha }_{5i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{5i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{5i} {S}_{t-i}^{neg}+{u}_{5t}$$
(14)
$${S}_{t}^{neg}={C}_{6}+{\sum }_{i=1}^{p}{\alpha }_{6i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{6i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{6i} {S}_{t-i}^{neg}+{u}_{6t}$$
(15)

The third model, a three-variable Vector Autoregression (VAR) model involving Pt, Vt, and Svalt, is expressed by the following formula:

$${P}_{t}={C}_{7}+{\sum }_{i=1}^{p}{\alpha }_{7i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{7i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{7i} {S}_{t-i}^{val}+{u}_{7t}$$
(16)
$${V}_{t}={C}_{8}+{\sum }_{i=1}^{p}{\alpha }_{8i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{8i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{8i} {S}_{t-i}^{val}+{u}_{8t}$$
(17)
$${S}_{t}^{val}={C}_{9}+{\sum }_{i=1}^{p}{\alpha }_{9i} {P}_{t-i}+{\sum }_{i=1}^{p}{\beta }_{9i} {V}_{t-i}+{\sum }_{i=1}^{p}{\gamma }_{9i} {S}_{t-i}^{val}+{u}_{9t}$$
(18)

Similarly, the formulas for the three-variable VAR models incorporating Pt, Vt, and Sarot, as well as Pt, Vt, and Sdomt, can be represented, but these are omitted here.

The Akaike Information Criterion (AIC) is then employed to determine the appropriate lag length for the VAR models. AIC, proposed by Akaike (1974), is a statistical measure defined as follows:

$$AIC={\text{ln}}\left(\frac{SSR}{T}\right)+\left(K+1\right)\frac{2}{T}$$
(19)

Here, T represents the sample size, SSR represents the sum of squared residuals, and K represents the number of explanatory variables.

Subsequently, the Granger causality test is applied to the estimated VAR model to evaluate the predictive capability of one endogenous variable on another. Established by Granger (1969), the Granger causality test is extensively utilized in econometrics and other fields to examine lead-lag relationships between time series variables.

Impulse Response Functions (IRFs) are utilized to track the impact of a single shock to one variable on the current and future values of endogenous variables within a VAR system. This approach enables a dynamic examination of how shocks to the system propagate over time and affect the levels of variables.

3.3 Evaluation methods

Conducting hypothesis testing on the estimated residuals, which serve as the disturbance terms, is critical for assessing the appropriateness, reliability, and statistical validity of VAR models. The standard assumptions regarding the disturbance term ut in VAR models are expressed in the following forms:

$$E\left({u}_{t}\right)=0, t=\mathrm{1,2},\cdots ,T$$
(20)
$$Var\left({u}_{t}\right)=E\left({u}_{t}{{u}{\prime}}_{t}\right)={\sum }_{u} =\left[\begin{array}{ccc}{\sigma }_{11}& \cdots & {\sigma }_{1k}\\ \vdots & \ddots & \vdots \\ {\sigma }_{k1}& \cdots & {\sigma }_{kk}\end{array}\right]$$
(21)
$$t=\mathrm{1,2},\cdots ,T$$
(22)
$$Cov\left({u}_{t},{u}_{s}\right)=E\left({u}_{t}{u{\prime}}_{s}\right)=0, t\ne s$$
(23)

Equation (21) represents the multivariate version of homoscedasticity, signifying that the covariance matrix is uniform across time changes. Equation (23) indicates that the disturbance term ut exhibits auto-uncorrelatedness. Furthermore, the assumption that the disturbance term ut follows a normal distribution underpins methodologies such as the Maximum Likelihood Estimation. Consequently, we conducted three tests related to the assumptions of disturbance terms, focusing on: testing for auto-uncorrelation, homoscedastic variance, and normal distribution of the disturbance terms.

For the hypothesis testing of auto-uncorrelation in the disturbance term, methods such as the Ljung–Box test are applied. The Ljung–Box test, proposed by Ljung and Box (1978), is a Portmanteau test that aggregates the squared autocorrelations of residuals across multiple lags. By calculating its test statistic, the Ljung–Box test adheres to a chi-squared distribution under the null hypothesis. A statistically significant outcome of this test suggests the presence of autocorrelation in the residuals, indicating the potential for a VAR model to have not fully captured the fundamental dynamics of the time series data. Conversely, an insignificant result from the Ljung–Box test, confirming the absence of such autocorrelation, implies that the residuals are essentially random, thereby reinforcing the reliability of the VAR model.

For the hypothesis testing of homoscedasticity in disturbance terms, techniques such as the ARCH-LM test are implemented. The ARCH-LM test, developed by Engle (1982), is a widely-recognized approach for detecting Autoregressive Conditional Heteroscedasticity (ARCH) effects in regression residuals over a specific number of lags. It entails fitting an autoregressive model to the squared residuals from a VAR model, followed by using the Lagrange Multiplier method to evaluate the presence of time-varying volatility in these residuals.

For testing the hypothesis of normal distribution of disturbance terms, methods such as the Jarque–Bera test are employed. The Jarque–Bera test, developed by Jarque and Bera (1980), employs the Lagrange multiplier procedure, which is simple to compute and asymptotically distributed as a χ2.

4 Results

The analysis results using a three-variable VAR model with Pt, Vt, and Spost are presented as follows. The unit root verification results via the ADF test are as shown in Appendix Table 14, confirming the absence of unit roots in all time series. From the AIC values displayed in Appendix Table 15, the appropriate lag order for the VAR model was determined to be 14. The coefficients, standard errors, and p-values for each variable in the VAR model with 14 lags are presented in Appendix Table 16, 17, and 18. Notably, the standard errors included in these tables utilize Newey-West standard errors, which are robust to heteroskedasticity and autocorrelation, in accordance with Newey and West (1987). Finally, the results of Granger causality among pairs of variables are presented in Table 8, and the effects via impulse response are illustrated in Fig. 10. The results in Table 8 reveal that none of the variables had a statistically significant impact on the others. Figure 10 and Table 16 suggest that immediately following a shock, positive sentiments influence price fluctuations positively for the first two days. The impact of the shock gradually diminishes, but a significant shock reappears after the 9th day, and by the 14th day, it nearly converges.

Table 8 Granger causality test results (Pt, Vt, and Spost)
Fig. 10
figure 10

Impulse response results (Pt, Vt, and Spost)

Similarly, the analysis results for the three-variable VAR model with Pt, Vt, and Snegt are presented. The ADF test results in Appendix Table 14 confirm the absence of unit roots in all time series. The appropriate lag order for the VAR model, determined from the AIC values shown in Appendix Table 20, was 15. The coefficients, standard errors, and p-values for each variable in the VAR model with 15 lags are presented in Appendix Tables 21, 22 and 23, along with the results of Granger causality among the variables in Table 9, and the effects via impulse response in Fig. 11. The results in Table 9 reveal that negative sentiments from celebrities have a statistically significant impact on future price fluctuations, and volume changes significantly affect future negative sentiments. It is noteworthy to add that when the same VAR analysis was conducted with the period split into two 18-month intervals, the former characteristic was more pronounced in the latter half with higher price volatility, while the latter characteristic was evident in the first half. Figures 11 and Table 21 show that negative sentiments negatively affect price fluctuations up to the second day post-occurrence, with a particularly significant negative shock observed on the second day. The shock diminishes after the third day but, akin to positive sentiments, a significant shock reemerges after the 9th day and nearly converges by the 15th day.

Table 9 Granger causality test results (Pt, Vt, and Snegt)
Fig. 11
figure 11

Impulse response results (Pt, Vt, Snegt)

Next, the analysis results of a three-variable VAR model using VAD sentiment are presented. The analysis of Pt, Vt, and valence sentiment Svalt includes ADF test results in Appendix Table 24, AIC values in Appendix Table 25, statistical values for each variable in the VAR model in Appendix Tables 26, 27, 28, Granger causality results in Table 10, and impulse response in Fig. 12. For the analysis of Pt, Vt, and arousal sentiment Sarot, ADF test results are in Appendix Table 29, AIC values in Appendix Table 30, statistical values in Appendix Tables 31, 32, 33, Granger causality results in Table 11, and impulse response in Fig. 13. The analysis of Pt, Vt, and dominance sentiment Sdomt includes ADF test results in Appendix Table 32, AIC values in Appendix Table 33, statistical values in Appendix Table 3436, Granger causality results in Table 12, and impulse response in Fig. 14. The only statistically significant Granger causality observed was from Svalt to Pt. Examining the impulse response of Svalt to Pt, it is noted that in the immediate aftermath of the shock, the first two days show a positive impact on price fluctuations. This characteristic is consistent with the impulse response from positive sentiment Spost to Pt. Additionally, a significant spike is observed after the 9th day, a feature common to both the impulse responses from positive sentiment Spost and negative sentiment Snegt to Pt.

Table 10 Granger causality test results (Pt, Vt, and Svalt)
Fig. 12
figure 12

Impulse response results (Pt, Vt, and Svalt)

Table 11 Granger causality test results (Pt, Vt, and Sarot)
Fig. 13
figure 13

Impulse response results (Pt, Vt, and Sarot)

Table 12 Granger causality test results (Pt, Vt, and Sdomt)
Fig. 14
figure 14

Impulse response results (Pt, Vt, and Sdomt)

Furthermore, to compare the magnitude of the shock caused by one unit of sentiment on future price fluctuations in the first two days across the five VAR models, the impulse response values were cumulated, as shown in Table 13. It was demonstrated that the impact of negative sentiment is approximately 1.2 times greater than that of positive sentiment. The magnitude of the impact over two days caused by valence sentiment was almost the same as that of positive sentiment. The impacts over two days of arousal sentiment and dominance sentiment were smaller compared to other sentiments.

Table 13 Impulse response values for the first 2 days after shock

The results of hypothesis tests for the residuals of the VAR model are all presented in Tables 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, and 53 of Appendix 2. While the Ljung–Box test results were favorable, there were instances in the results of the ARCH-LM and Jarque–Bera tests where the null hypothesis was rejected.

5 Conclusions

This study analyzed the impact of X sentiments (positive, negative, valence, arousal, dominance) of 30 influential celebrities in the Bitcoin community on future Bitcoin prices and volumes using a three-variable VAR model. The Granger causality tests revealed that negative sentiments significantly influence future Bitcoin price fluctuations. Analysis of impulse responses indicated that positive and valence sentiments lead to a positive impact on price fluctuations for up to two days following the sentiment's occurrence, whereas negative sentiments have a negative impact for the same duration. These results suggest that the sentiments expressed by influential celebrities directly affect the short-term emotions and actions of Bitcoin market participants. Furthermore, it was observed that the impact of all sentiments on price fluctuations tends to converge within approximately two weeks, indicating that celebrities' sentiments do not have a lasting effect on future Bitcoin prices. This outcome aligns with the “Sentiment Theory”, suggesting that while celebrities' statements on social media may not contain intrinsic information about future Bitcoin prices, they could include information about market sentiment. Another intriguing finding from these results is that the impact of negative sentiments on immediate price fluctuations was greater than that of positive sentiments. This aligns with psychological research, like that of Baumeister et al. (2001), which shows that negative information has a stronger effect on individual cognition and behavior than positive information, supporting the findings of this study. Additionally, these results are consistent with the loss aversion concept of Prospect Theory proposed by Kahneman and Tversky (1979), indicating that market participants are overly sensitive to potential losses and risks, potentially leading to significant price volatility. However, it is important to note, as highlighted by Loughran and McDonald (2016), the ambiguous use of positive words in English, particularly in financial documents. While negative words are rarely negated to form positive expressions, positive words are frequently used to construct negative statements. Therefore, sentiments classified as positive may carry ambiguity, and positive sentiments may not necessarily have a positive impact on the market.

The significance of this study lies in its analysis of Bitcoin price fluctuations using X sentiments of not just globally recognized figures like Elon Musk but also those influencers who garner widespread support from market participants. This research holds importance for investors and market analysts in refining investment strategies and risk management, and making more effective investment decisions based on market sentiment fluctuations. Compared to similar research by Kraaijeveld and De Smedt (2020), this study introduces novelty by employing five diverse emotions for a multifaceted analysis. Additionally, while their research had to remove bot-generated posts from numerous unspecified accounts, this study benefits from using fixed accounts, thus eliminating the need to consider bots, significantly reducing the analytical effort, which could be advantageous for investors.

However, this study has several limitations. Firstly, there is the validity of the selected celebrity accounts. The representativeness of these accounts as opinion leaders in the Bitcoin market is not sufficiently verified, necessitating further analysis with data from different periods and various accounts. Secondly, there is a lack of consideration for the characteristics of X. X has features like “likes,” “reposts,” and commenting, which allow opinions to form and spread over time. One possible explanation for this could be that statements with substantial impact are spread through reposts on X, leading to a viral effect that amplifies and subsequently results in noticeable market reactions. However, detailed analysis on how quickly and extensively X's viral effects influence users has been largely unexplored, remaining largely speculative. This remains an important area for future research. Thirdly, the study does not consider the immediate impact of X sentiment. While it follows the framework of VAR analysis that considers only the impact up to the previous day, excluding the impact on the same day, the potential immediate effects of X sentiment have not been addressed. Future empirical research will also have to deal with this aspect. Lastly, the study does not account for the intensity of sentiment. Developing methods to incorporate sentiment intensity into the analysis could lead to more accurate predictions of market trends. Future research is expected to address these issues and construct a more comprehensive and accurate predictive model for Bitcoin market trends.

As of December 2023, the United States has not yet approved a spot Bitcoin ETF (Exchange-Traded Fund), but its eventual realization could be a structural change for the Bitcoin market. The introduction of a spot ETF is expected to broadly open Bitcoin investment to both retail and institutional investors, potentially encouraging an influx of new capital. Moreover, it could signify the wider acceptance of Bitcoin as a conventional financial asset, marking an important step towards the maturation of the market. This study, conducted against the backdrop of an immature Bitcoin market, has demonstrated one potential approach to this nascent field. The approval of a spot Bitcoin ETF not only presents the possibility for the Bitcoin market to mature but also paves the way for future analysis.