Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Financial analysis and computational finance have been an active area of research for many decades [17]. Over the years, several new tools and methodologies have been developed that aim to predict the direction as well as range of financial market instruments as accurately as possible [16]. Before the emergence of internet, information regarding company’s stock price, direction and general sentiments took a long time to disseminate among people. Also, the companies and markets took a long time (weeks or months) to calm market rumors, news or false information (memes in Twitter context). Web \(3.0\) is characterized with fast pace information dissemination as well as retrieval [6]. Spreading good or bad information regarding a particular company, product, person etc. can be done at the click of a mouse [1, 7] or even using micro-blogging services such as Twitter [4]. Recently scholars have made use of twitter feeds in predicting box office revenues [2], political game wagons [29], rate of flu spread [27] and disaster news spread [11]. For short term trading decisions, short term sentiments play a very important role in short term performance of financial market instruments such as indexes, stocks and bonds [24].

Early works on stock market prediction can be summarized to answer the question—Can stock prices be really predicted? There are two theories—(1) random walk theory (2) and efficient market hypothesis (EMH) [22]. According to EMH stock index largely reflect the already existing news in the investor community rather than present and past prices. On the other hand, random walk theory argues that the prediction can never be accurate since the time instance of news is unpredictable. A research conducted by Qian et al. compared and summarized several theories that challenge the basics of EMH as well as the random walk model completely [23]. Based on these theories, it has been proven that some level of prediction is possible based on various economic and commercial indicators. The widely accepted semi-strong version of the EMH claims that prices aggregate all publicly available information and instantly reflect new public version [19]. It is well accepted that news drive macro-economic movement in the markets, while researches suggests that social media buzz is highly influential at micro-economic level, specially in the big indices like DJIA [5, 14, 20, 26]. Through earlier researches it has been validated that market is completely driven by sentiments and bullishness of the investor’s decisions [23]. Thus a comprehensive model that could incorporate these sentiments as a parameter is bound to give superior prediction at micro-economic level.

Earlier work done by Bollen et al. shows how collective mood on Twitter (aggregate of all positive and negative tweets) is reflected in the DJIA index movements [5] and [20]. In this work we have applied simplistic message board approach by defining bullishness and agreement terminologies derived from positive and negative vector ends of public sentiment w.r.t. each market security or index terms (such as returns, trading volume and volatility) [25]. Proposed method is not only scalable but also gives more accurate measure of large scale investor sentiment that can be potentially used for short term hedging strategies as discussed ahead in Sect. 6. This gives clear distinctive way for modeling sentiments for service based companies such as Google in contrast to product based companies such as Ebay, Amazon and Netflix. We validate that Twitter feed for any company reflects the public mood dynamics comprising of breaking news and discussions, which is causative in nature. Therefore it adversely affects any investment related decisions which are not limited to stock discussions or profile of mood states of entire Twitter feed.

In Sect. 2, we discuss the motivation of this work and related work in the area of stock market prediction in Sect. 3. In Sect. 4 we explain the techniques used in mining data and explain the terminologies used in market and tweet board literature. In Sect. 5 we have given prediction methods used in this model with the forecasting results. In Sect. 6 we discuss how Twitter based model can be used for improving hedging decisions in a diversified portfolio by any trader. Finally in Sect. 7 we discuss the results and in Sect. 8 we present the future prospects and conclude the work.

2 Motivation

Communities of active investors and day traders who are sharing opinions and in some case sophisticated research about stocks, bonds and other financial instruments will actually have the power to move share prices ...making Twitter-based input as important as any other data related to the stock

–TIME (2009) [21]

High Frequency Trading (HFT) comprises of very high percentage of trading volumes in the present US stock exhange. Traders make an investment position that is held only for very brief periods of time—sometimes only for a few seconds. Investors rapidly trades into and out of those positions, sometimes thousands or tens of thousands of times a day. Therefore the value of an investment is as good as the last known index price. Most investors will make use of anything that will give them an advantage in placing market bets. A large percentage of high frequency traders have trained AI bots to capture buzzing trends in the social media feeds without learning dynamics of the sentiment and accurate context of the deeper information being diffused in the social networks. For example, in February 2011 during Oscars when Anne Hathaway was trending, stock prices of Berkshire Hathaway rose by 2.94 % [28]. Figure 1 highlight the incidents when the stock price of Berkshire Hathaway jumped coinciding with an increase of buzz on social networks/ micro-blogging websites regarding Anne Hathaway (for example during movie releases).

Fig. 1
figure 1

Historical chart of Berkshire Hathaway (BRK.A) stock over the last 3 years. Highlighted points (AF) are the days when its stock price jumped due to an increased news volume on social networks and Twitter regarding Anne Hathaway. Courtesy Google Finance

The events are marked as red points in the Fig. 1, event specific news on the points:

A: Oct. 3, 2008—Rachel Getting Married opens: BRK.A up 0.44 %

B: Jan. 5, 2009 — Bride Wars opens: BRK.A up 2.61 %

C: Feb. 8, 2010—Valentine’s Day opens: BRK.A up 1.01 %

D: March 5, 2010—Alice in Wonderland opens: BRK.A up 0.74 %

E: Nov. 24, 2010—Love and Other Drugs opens: BRK.A up 1.62 %

F: Nov. 29, 2010—Anne announced as co-host of the Oscars: BRK.A up 0.25 %

G: Feb. 28. 2011—Anne hosts Oscars with James Franco: BRK.A up 2.94 %

As seen in this example, large volume of tweets can create short term influential effects on stock prices. Simple observations such as these motivate us to investigate deeper relationship between the dynamics of social media messages and market movements [17]. This work is not directed towards finding a new stock prediction technique, which would certainly include effects of various other macroeconomic factors. The aim of this work, is to quantitatively evaluate the effects of twitter sentiment dynamics around a stocks indices/stock prices and use it in conjunction with the standard model to improve the accuracy of prediction. Further in Sect. 6 we investigate into how tweets can be very useful in identifying trends in futures and options markets and to build hedging strategies to protect one’s investment position in the shorter term.

3 Related Work

There have been several works related to web mining of data (blogposts, discussion boards and news) [3, 12, 14] and to validate the significance of assessing behavioral changes in the public mood to track movements in stock markets. Some trivial work shows information from investor communities is causative of speculation regarding private and forthcoming information and commentaries [8, 9, 18, 30]. Dewally in 2003 worked upon naive momentum strategy confirming recommended stocks through user ratings had significant prior performance in returns [10]. But now with the pragmatic shift in the online habits of communities around the worlds, platforms like StockTwitsFootnote 1 [31] and HedgeChatterFootnote 2 have come. Das and Chen made the initial attempts by using natural language processing algorithms classifying stock messages based on human trained samples. However their result did not carried statistically significant predictive relationships [9].

Gilbert et al. and Zhang et al. have used corpus from livejournal blogposts in assessing the bloggers sentiment in dimensions of fear , anxiety and worry making use of Monte Carlo simulation to reflect market movements in S&P 500 index [14, 32]. Similar and significantly accurate work is done by Bollen et al. [5] who used dimensions of Google—Profile of Mood States to reflect changes in closing price of DJIA. Sprengers et al. analyzed individual stocks for S&P 100 companies and tried correlating tweet features about discussions of the stock discussions about the particular companies containing the Ticker symbol [26]. However these approaches have been restricted to community sentiment at macro-economic level which doesn’t give explanatory dynamic system for individual stock index for companies as discussed in our previous work [25]. Thus deriving a model that is scalable for individual stocks/ companies and can be exploited to make successful hedging strategies as discussed in Sect. 6.

4 Web Mining and Data Processing

In this section we describe our method of Twitter and financial data collection as shown in Fig. 2. In the first phase, we mine the tweet data and after removal of spam/noisy tweets, they are subsequently subjected to sentiment assessment tools in phase two. In later phases feature extraction, aggregation and analysis is done.

Fig. 2
figure 2

Flowchart of the proposed methodology. Four set of results obtained (1) correlation results for twitter sentiments and stock prices for different companies (2) Granger’s casuality analysis to causation (3) Using EMMS for quantitative comparison (4) Performance of forecasting method over different time windows

4.1 Tweets Collection and Processing

Out of other investor forums and discussion boards, Twitter has widest acceptance in the financial community and all the messages are accessible through a simple search of requisite terms through an application programming interface (API)Footnote 3. Sub forums of Twitter like StockTwits and TweetTrader have emerged recently as hottest place for investor discussion buy/sell out at voluminous rate. Efficient mining of sentiment aggregated around these tweet feeds provides us an opportunity to trace out relationships happening around these market sentiment terminologies. Currently more than 250 million messages are posted on Twitter everyday (Techcrunch October 2011Footnote 4).

This study was conducted over a period of 14 months period between June 2nd 2010 to 29th July 2011. During this period, we collected 4,025,595 (by around 1.08 M users) English language tweets Each tweet record contains tweet identifier, date/time of submission (in GMT), language and text. Subsequently the stop words and punctuation are removed and the tweets are grouped for each day (which is the highest time precision window in this study since we do not group tweets further based on hours/minutes). We have directed our focus DJIA, NASDAQ-100 and 11 major companies listed in Table 1. These companies are some of the highly traded and discussed technology stocks having very high tweet volumes.

Table 1 List of companies

11 tech companies are selected on the basis of average message volume. If their average tweet volume is more than the tweet discussion volume of DJIA and NASDAQ-100, they are included in the analysis, as observed in the Fig. 3. In this study we have observed that technology stocks generally have a higher tweet volume than non-technology stocks. One reason for this may be that all technology companies come out with new products and announcements much more frequently than companies in other sectors (say infrastructure, energy, FMCG, etc.) thereby generating greater buzz on social media networks. However, our model may be applied to any company/indices that generate high tweet volume.

4.2 Sentiment Classification

In order to compute sentiment for any tweet we had to classify each incoming tweet everyday into positive or negative using nave classifier. For each day total number of positive tweets is aggregated as \(Positive_{day}\) while total number of negative tweets as \(Negative_{day}\). We have made use of JSON API from Twittersentiment,Footnote 5 a service provided by Stanford NLP research group [15]. Online classifier has made use of Naive Bayesian classification method, which is one of the successful and highly researched algorithms for classification giving superior performance to other methods in context of tweets. Their classification training was done over a dataset of 1,600,000 tweets and achieved an accuracy of about 82.7 %. These methods have high replicability and few arbitrary fine tuning elements.

Fig. 3
figure 3

Graph for average of log of daily volume over the months under study

In our dataset roughly 61.68 % of the tweets are positive, while 38.32 % of the tweets are negative for the company stocks under study. The ratio of 3:2 indicates stock discussions to be much more balanced in terms of bullishness than internet board messages where the ratio of positive to negative ranges from 7:1 [10] to 5:1 [12]. Balanced distribution of stock discussion provides us with more confidence to study information content of the positive and negative dimensions of discussion about the stock prices on microblogs.

4.3 Tweet Feature Extraction

One of the research questions this study explores is how investment decisions for technological stocks are affected by entropy of information spread about companies under study in the virtual space. Tweet messages are micro-economic factors that affect stock prices which is quite different type of relationship than factors like news aggregates from traditional media, chatboard room etc. which are covered in earlier studies over a particular period [10, 12, 18]. Keeping this in mind we have only aggregated the tweet parameters (extracted from tweet features) over a day. In order to calculate parameters weekly, bi-weekly, tri-weekly, monthly, 5 weekly and 6 weekly we have simply taken average of daily twitter feeds over the requisite period of time.

Fig. 4
figure 4

Tweet sentiment and market features

Twitter literature in perspective of stock investment is summarized in Fig. 4. We have carried forward work of Antweiler et al. for defining bullishness (\(B_t\)) for each day (or time window) given equation as:

$$\begin{aligned} B_t = \ln \left( {\frac{1+{M_t}^{Positive}}{1+{M_t}^{Negative}}}\right) \end{aligned}$$
(1)

Where \({M_t}^{Positive}\) and \({M_t}^{Negative}\) represent number of positive or negative tweets on a particular day \(t\). Logarithm of bullishness measures the share of surplus positive signals and also gives more weight to larger number of messages in a specific sentiment (positive or negative). Message volume for a time interval t is simply defined as natural logarithm of total number of tweets for a specific stock/index which is \(\ln ({M_t}^{Positive}+ {M_t}^{Negative})\). The agreement among positive and negative tweet messages is given by:

$$\begin{aligned} A_t = 1- \sqrt{(1- \mid \frac{(M_t^{Positive}- M_t^{Negative})}{(M_t^{Positive}+ M_t^{Negative})}\mid } \end{aligned}$$
(2)

If \(all\) tweet messages about a particular company are bullish or bearish, agreement would be \(1\) in that case. Influence of silent tweets days in our study (trading days when no tweeting happens about particular company) is less than 0.1 % which is significantly less than previous research [12, 26]. Carried terminologies for all the tweet features {Positive, Negative, Bullishness, Message Volume, Agreement} remain same for each day with the lag of one day. For example, carried bullishness for day \(d\) is given by \(Carried Bullishness _{d-1}\).

4.4 Financial Data Collection

We have downloaded financial stock prices at daily intervals from Yahoo Finance APIFootnote 6 for DJIA, NASDAQ-100 and the companies under study given in Table 1. The financial features (parameters) under study are opening (\(O_t\)) and closing (\(C_t\)) value of the stock/index, highest (\(H_t\)) and lowest (\(L_t\)) value of the stock/index and returns. Returns are calculated as the difference of logarithm to the base \(e\) between the closing values of the stock price of a particular day and the previous day.

$$\begin{aligned} R_t= \{{\ln Close_{(t)}-\ln Close_{(t-1)}}\}\times 100 \end{aligned}$$
(3)

Trading volume is the logarithm of number of traded shares. We estimate daily volatility based on intra-day highs and lows using Garman and Klass volatility measures [13] given by the formula:

$$\begin{aligned} \sigma = \sqrt{\frac{1}{n}\sum {\frac{1}{2}\left[ \ln {\frac{H_t}{L_t}}\right] ^2 - \left[ 2\ln {2}-1\right] \left[ \ln {\frac{C_t}{O_t}}\right] ^2}} \end{aligned}$$
(4)

5 Statistical Analysis and Results

We begin our study by identifying the correlation between the Twitter feed features and stock/index parameters which give the encouraging values of statistically significant relationships with respect to individual stocks(indices). To validate the causative effect of tweet feeds on stock movements we have used econometric technique of Granger’s Casuality Analysis. Furthermore, we make use of expert model mining system (EMMS) to propose an efficient prediction model for closing price of DJIA and NASDAQ \(100\). Since this model does not allow us to draw conclusion about the accuracy of prediction (which will differ across size of the time window) subsequently discussed later in this section.

5.1 Correlation Matrix

For the stock indices DJIA and NASDAQ and 11 tech companies under study we have come up with the correlation matrix as heatmap given in Fig. 5 between the financial market and Twitter sentiment features explained in Sect. 4. Financial features for each stock/index (Open, Close, Return, Trade Volume and Volatility) is correlated with Twitter features (Positive, Negative, Bullishness, Carried Positive, Carried Negative and Carried Bullishness). The time period under study is monthly average as it the most accurate time window that gives significant values as compared to other time windows which is discussed later Sect. 5.4. Heatmap in Fig. 5 indicative of significant relationships between various twitter features with the index features.

Fig. 5
figure 5

Heatmap showing Pearson correlation coefficients between security indices versus features from Twitter

Our approach shows strong correlation values between various features (upto \(-0.96\) for opening price of Oracle and \(0.88\) for returns from DJIA index etc.) and the average value of correlation between various features is around \(0.5\). Comparatively highest correlation values from earlier work has been around \(0.41\) [26]. As the relationships between the stock(index) parameters and Twitter features show different behavior in magnitude and sign for different stocks(indices), a uniform standardized model would not applicable to all the stocks(indices). Therefore, building an individual model for each stock(index) is the correct approach for finding appreciable insight into the prediction techniques. Trading volume is mostly governed by agreement values of tweet feeds as \(-0.7\) for same day agreement and \(-0.65\) for DJIA. Returns are mostly correlated to same day bullishness by \(0.61\) and by lesser magnitude \(0.6\) for the carried bullishness for DJIA. Volatility is again dependent on most of the Twitter features, as high as \(0.77\) for same day message volume for NASDAQ-100.

One of the anomalies that we have observed is that EBay gives negative correlation between the all the features due to heavy product based marketing on Twitter which turns out as not a correct indicator of average growth returns of the company itself.

5.2 Bivariate Granger Causality Analysis

The results in previous section show strong correlation between financial market parameters and Twitter sentiments. However, the results also raise a point of discussion: Whether market movements affects Twitter sentiments or Twitter features causes changes in the markets? We make use of Granger Causality Analysis (GCA) to the time series averaged to weekly time window to returns through DJIA and NASDAQ-100 with the Twitter features (positive, negative, bullishness, message volume and agreement). Granger Causality Analysis (GCA) is not used to establish causality, but as an economist tool to investigate a statistical pattern of lagged correlation. A similar observation that the clouds precede rain is widely accepted. GCA rests on the assumption that if a variable X causes Y then changes in X will be systematically occur before the changes in Y. We realize lagged values of X shall bear significant correlation with Y. However correlation is not necessarily behind causation. We have made use of GCA in similar fashion as [5, 14] This is to test if one time series is significant in predicting another time series. Let returns \(R_t\) be reflective of fast movements in the stock market. To verify the change in returns with the change in Twitter features we compare the variance given by following linear models:

$$\begin{aligned} R_{t}= \alpha +\sum _{i=1}^{n}\beta _{i}D_{t-i}+ \varepsilon _{t} \end{aligned}$$
(5)
$$\begin{aligned} R_t= \alpha +\sum _{i=1}^{n}\beta _iD_{t-i}+ \sum _{i=1}^{n}{\gamma _i X_{t-i}}+ \varepsilon _t \end{aligned}$$
(6)

Equation 5 uses only ‘\(n\)’ lagged values of \(R_t\), i.e. (\(R_{t-1}, . . ., R_{t-n}\)) for prediction, while Eq. 6 uses the \(n\) lagged values of both \(R_t\) and the tweet features time series given by \(X_{t-1}, . . . , X_{t-n}\). We have taken weekly time window to validate the casuality performance, hence the lag valuesFootnote 7 will be calculated over the weekly intervals \(1,2,...,7\). From the Table  2, we can reject the null hypothesis \((H_o)\) that the Twitter features do not affect returns in the financial markets i.e. \(\beta _{1,2,...,n} \ne 0\) with a high level of confidence (P-alues closer to zero signify stronger causative relationship). However as we see the result applies to only specific negative and positive tweets. Other features like agreement and message volume do not have significant casual relationship with the returns of a stock index (high p-values).

Table 2 Granger’s casuality analysis of DJIA and NASDAQ for 7 week lag Twitter sentiments

5.3 EMMS Model for Forecasting

We have used Expert Model Mining System (EMMS) which incorporates a set of competing methods such as Exponential Smoothing (ES), Auto Regressive Integrated Moving Average (ARIMA) and seasonal ARIMA models. In this work, selection criterion for the EMMS is coefficient of determination (R squared) which is square of the value of pearson-‘r’ of fit values (from the EMMS model) and actual observed values. Mean absolute percentage error (MAPE) and maximum absolute percentage error (MaxPAE) are mean and maximum values of error (difference between fit value and observed value in percentage). To show the performance of tweet features in prediction model, we have applied the EMMS twice—first with tweets features as independent predictor events and second time without them. This provides us with a quantitative comparison of improvement in the prediction using tweet features.

Table 3 EMMS model fit characteristics for DJIA and NASDAQ-100

In the dataset we have time series for a total of approximately 60 weeks (422 days), out of which we use approximately 75 % i.e. 45 weeks for the training both the models with and without the predictors for the time period June 2nd 2010 to April 14th 2011. Further we verify the model performance as one step ahead forecast over the testing period of 15 weeks from April 15th to 29th July 2011 which count for wide and robust range of market conditions. Forecasting accuracy in the testing period is compared for both the models in each case in terms of maximum absolute percentage error (MaxAPE), mean absolute percentage error (MAPE) and the direction accuracy. MAPE is given by the Eq. 7, where \(\hat{y_i}\) is the predicted value and \(y_i\) is the actual value.

$$\begin{aligned} MAPE= \frac{{{\Sigma ^{n}}_{i}} |\frac{y_i - \hat{y_i}}{y_i}|}{n} \times 100 \end{aligned}$$
(7)

While direction accuracy is measure of how accurately market or commodity up/ down movement is predicted by the model, which is technically defined as logical values for \((y_{i, \hat{t}+1}- y_{i,t})\times (y_{i,t+1}- y_{i,t})>0\) respectively.

As we can see in the Table 3, there is significant reduction in MaxAPE for DJIA(2.37 to 1.76) and NASDAQ-100 (2.96 to 2.69) when EMMS model is used with predictors as events which in our case our all the Tweet features (positive, negative, bullishness, message volume and agreement). There is significant decrease in the value of MAPE for DJIA which is \(0.8\) in our case than \(1.79\) for earlier approaches [5]. As we can from the values of R-square, MAPE and MaxAPE in Table 3 for both DJIA and NASDAQ \(100\), our proposed model uses Twitter sentiment analysis for a superior performance over traditional methods. Figure  6 shows the EMMS model fit for weekly closing values for DJIA and NASDAQ \(100\). In the figure fit are model fit values, observed are values of actual index and UCL & LCL are upper and lower confidence limits of the prediction model.

Fig. 6
figure 6

Plot of Fit values (from the EMMS model) and actual observed closing values for DJIA and NASDAQ-100

Fig. 7
figure 7

Plot of R-square values over different time windows for DJIA and NASDAQ-100. Higher values denote greater prediction accuracy

5.4 Prediction Accuracy Using OLS Regression

Our results in the previous section showed that forecasting performance of stocks/ indices using Twitter sentiments varies for different time windows. Hence it is important to quantitatively deduce a suitable time window that will give us most accurate prediction. Figure 7 shows the plot of R-square metric for OLS regression for returns from stock indexes NASDAQ-100 and DJIA from tweet board features (like number of positive, negative, bullishness, agreement and message volume) both for carried (at 1-day lag) and same week. From the Fig. 7 it can be inferred as we increase the time window the accuracy in prediction increases but only till a certain point that is monthly in our case beyond which value of R-square starts decreasing again. Thus, for monthly predictions we have highest accuracy in predicting anomalies in the returns from the tweet board features.

6 Hedging Strategy Using Twitter Sentiment Analysis

Portfolio protection is very important practice that is weighted as much as portfolio appreciation. Just like a normal user purchases insurance for its house, car or any commodity, one can also buy insurance for the investment that is made in the stock securities. This doesn’t prevent a negative event from happening, but if it does happen and you’re properly hedged, the impact of the event is reduced. In a diverse portfolio hedging against investment risk means strategically using instruments in the market to offset the risk of any adverse price movements. Technically, to hedge investor invests in two securities with negative correlations, which again in itself is time varying dynamic statistics.

To explain how weekly forecast based on mass tweet sentiment features can be potentially useful for a singular investor, we will take help of a simple example.

Let us assume that the share for a company C1 is available for $X per share and the cost of premium for a stock option of company C1 (with strike price $X) is $Y.

A = total amount invested in shares of a company C1 which is number of shares (let it be N) \(\times \) $X

B = total amount invested in put option of company C1 (relevant blocksize \(\times \) $Y)

And always for an effective investment (N \(\times \) $X) \(>\) ( Blocksize \(\times \) $Y)

An investor shall choose the value of N as per as their risk appetitive i.e. ratio of A:B \(=\) 2:1 (assumed in our example, will vary from from investor to investor). Which means in the rising market conditions, he would like to keep 50 % of his investment to be completely guarded, while rest 50 % are risky components; whereas in the bearish market condition he would like to keep his complete investment fully hedged by buying put options equivalent of all the investment he has made in shares for the same security. From Fig. 8, we infer for the P/L curves consisting of shares and 2 different put options for the company C1 purchased as different time intervals Footnote 8; hence the different premium price even with the same strike price of $X. Using married put strategy makes the investment risk free but reduces the rate of return in contrast to the case which comprises of only equity security which is completely free-fall to the market risk. Hence the success of married put strategy depends greatly on the accuracy of predicting whether the markets will rise of fall. Our proposed Tweet sentiment analysis can be highly effective in this prediction to determine accurate instances when the investor should readjust his portfolio before the actual changes happen in the market. Our proposed approach provides an innovative technique of using dynamic Twitter sentiment analysis to exploit the collective wisdom of the crowd for minimising the risk in a hedged portfolio. Below we summarize two different portfolio states at different market conditions (Table 4).

Table 4 Example investment breakdown in the two cases
Fig. 8
figure 8

Portfolio adjustment in cases of bearish (fully hedged) and bullish (partial hedged) market scenarios. In both the figures, strike price is the price at which a option is purchased, Break even point (BEP) is the instance when investment starts making profit. In case of bearish market scenario, two options at same strike price (but different premiums) are in purchased at different instances, Option1 brought at the time of initial investment and Option2 brought at a later stage (hence lower in premium value)

To check the effectiveness of our proposed tweet based hedging strategy, we run simulations and make portfolio adjustments in various market conditions (bullish, bearish, volatile etc). To elaborate, we take an example of DJIA ETF’s as the underlying security over the time period of 14th November 2010 to 30th June 2011. Approximately 76 % of the time period is taken in the training phase to tune the SVM classifier (using tweet sentiment features from the prior week). This trained SVM classifier is then used to predict market direction (DJIA’s index movement) in the coming week. Testing phase for the classification model (class 1—bullish market \(\uparrow \) and class 0- bearish market \(\downarrow \)) is from 8th May to 30th June 2011 consisting a total of 8 weeks. SVM model is build using KSVM classification technique with the linear (vanilladot—best fit) kernel using the package ‘e1071’ in R statistical framework. Over the training dataset, the tuned value of the objective function is obtained as \(-4.24\) and the number of support vectors is \(8\). Confusion matrix for the predicted over the actual values (in percentage) is given in Table 5. (Percentages do not sum to full 100 % as the remaining 12.5 % falls under the third type of class when the value of the index do not change. This class is excluded in the current analysis due to limitations of data period) Overall classifier accuracy over the testing phase is \(85.7\,\%\). Receiver operator characteristics (ROC) curve measuring the accuracy of the classifier as true positive rate to false positive rate is given in the Fig. 9. It shows the tradeoff between sensitivity i.e. true positive rate and specificity i.e. true negative rate (any increase in sensitivity will be accompanied by a decrease in specificity). Good statistical significance for the classification accuracy can be inferred from the value of area under the ROC curve (AUC) which comes out to \(0.88\).

Table 5 Prediction accuracy over the testing phase (8 weeks). Values in percentage
Fig. 9
figure 9

Receiver operating characteristic (ROC curve) curve for the KSVM classifier prediction over the testing phase. ROC is graphical plot of the sensitivity or true positive rate, versus false positive rate (one minus the specificity or true negative rate). More the area under curve for typical ROC, more is the performance efficiency of the machine learning algorithm

Figure 10 shows the DJIA index during the testing period and the arrows mark the weeks when the adjustment is done in the portfolio based on prediction obtained from tweet sentiment analysis of prior week. At the end of the week (on Sunday), using tweet sentiment feature we predict what shall be the market condition in the coming week- whether the prices will go down or up. Based on the prediction portfolio adjustment—bearish \(\longrightarrow \) bullish or bullish \(\longrightarrow \) bearish.

7 Discussions

In Sect. 5, we observed how the statistical behavior of market through Twitter sentiment analysis provides dynamic window to the investor behavior. Furthermore, in the Sect. 6 we discussed how behavioral finance can be exploited in portfolio decisions to make highly reduced risked investment. Our work answers the important question—If someone is talking bad/good about a company (say Apple etc.) as singular sentiment irrespective of the overall market movement, is it going to adversely affect the stock price? Among the 5 observed Twitter message features both at same day and lagged intervals we realize only some are Granger causative of the returns from DJIA and NASDAQ-100 indexes, while changes in the public sentiment is well reflected in the return series occurring at even lags of \(1,2\) and \(3\) weeks. Remarkably the most significant result is obtained for returns at lag 2 (which can be inferred as possible direction for the stock/index movements in the next week).

Fig. 10
figure 10

DJIA index during the testing period. In the figure green marker shows adjustment bearish \(\longrightarrow \) bullish, while red arrow shows adjustment bullish \(\longrightarrow \) bearish. (Data-point at the centre of the box) (Data courtesy Yahoo! finance)

Table 6 Comparison of various approaches for modeling markets movements through Twitter

Table 6 given below explains the different approaches to the problem that have been done in past by researchers [5, 14, 26]. As can be seen from the table, our approach is scalable, customizable and verified over a large data set and time period as compared to other approaches. Our results are significantly better than the previous work. Furthermore, this model can be of effective use in formulating short-term hedging strategies (using our proposed Twitter based prediction model).

8 Conclusion

In this chapter, we have worked upon identifying relationships between Twitter based sentiment analysis of a particular company/index and its short-term market performance using large scale collection of tweet data. Our results show that negative and positive dimensions of public mood carry strong cause-effect relationship with price movements of individual stocks/indices. We have also investigated various other features like how previous week sentiment features control the next week’s opening, closing value of stock indexes for various tech companies and major index like DJIA and NASDAQ-100. Table 6 shows as compared to earlier approaches in the area which have been limited to wholesome public mood and stock ticker constricted discussions, we verify strong performance of our alternate model that captures mass public sentiment towards a particular index or company in scalable fashion and hence empower a singular investor to ideate coherent relative comparisons. Our analysis of individual company stocks gave strong correlation values (upto 0.88 for returns) with twitter sentiment features of that company. It is no surprise that this approach is far more robust and gives far better results (upto 91% directional accuracy) than any previous work.