Keywords

1 Introduction

The accelerating global expansion of the Internet over the last 20 years has transformed our economic reality. Increasing digitization has not only been reflected in the emergence of IT giants as eBay, Google or Amazon, but has also had a powerful impact on trading on electronic stock exchanges such as Xetra or Nasdaq [21].

The information revolution entails a lot of advantages for the existing economy, but challenges are becoming more obvious, too. Large amounts of data have become available, leading to an incredible growth in information, relevant for decision-making in financial markets. Still, the key challenge is to discern the pivotal pieces of information from the many irrelevancies in the abundant data pool [20].

Sentiment analysis is one approach to structure and quantify qualitative, textual pieces of information within this increasing flow of information. Sentiment refers to the optimism or pessimism embedded in language [34]. Sentiment analysis is a method of quantifying the extent of optimism or pessimism as conveyed by textual information such as news releases or financial disclosures [23].

Text mining programs allow one to convert relevant data into information signals and utilize them for further analysis. The most important advantage of the digitization of the economy is the increasing efficiency of markets, which crucially depends on the availability of related data. Both producers and consumers also benefit from the widespread proliferation of information. Thanks to online commentaries and reviews, consumers can easily find out how other users evaluate a product. Producers profit from the ability of forecasting demand, changes in the economic environment and exchange rates more precisely.

The everyday business of stock exchanges has seen radical changes due to digitization, as well. Floor trading has all but disappeared as more trades are effected on electronic platforms. Now analysts and investors are subject to an increasing time pressure while analyzing myriads of isolated pieces of information and acting accordingly. Speed and quality of analysis are essential to the success of market participants. Liebmann et al. [21] demonstrate that analysts and investors vary in how long they need to digest new information. Under such circumstances, it becomes imperative to process incoming data as quickly and accurately as possible. The purpose of sentiment analysis is to scrutinize the latest ad hoc company news in a matter of mere seconds by turning a textual message into a numerical digit, the sentiment value.

In this context, this paper aims to study how the sentiment encoded in the mandatory, stock-relevant ad hoc filings of stock-listed companies conveys relevant information about the state of the economy. We evaluate how this sentiment measure can explain future movements in stock indices, the scale of industrial orders and the unemployment rate. Our findings suggest that sentiment can be a relevant indicator, but that its effect on economic indicators may shift during or after financial crises.

This paper aims at contributing to the understanding of how sentiment effects economic indicators and is embedded into an overall research project, which aims to develop a sentiment index. A sentiment index is advantageous compared to other indices, like the Ifo index or the American Purchasing Manager Index (PMI), because it is less cost-intensive to collect sentiment data and the method reflects new information instantaneously.

This paper’s objective is to determine whether sentiment values are capable of explaining the performance of the CDAX and changes in macroeconomic variables. In Sect. 2, we review previous literature regarding sentiment analysis and indices. We introduce our research methodology in Sects. 3 and 4. In Sect. 5, we present our results and discuss their implications in Sect. 6, before concluding in Sect. 7.

2 Theoretical Background

2.1 Sentiment Analysis

As Arthur [1] has pointed out, judgment regarding current and future economic situations is highly influenced by subjective beliefs. Each individual forms expectations and hypotheses about the surrounding environment, which in turn provide the basis for economic decision-making. Moves on stock exchanges, therefore, result from market participants’ perceptions of the future. Soroka [33] studies the impact of “good” and “bad” economic news and arrives at the conclusion that the reaction to bad news is considerably stronger.

Tetlock et al. [35] show that the influence of media on financial markets is substantial. Elevated levels of pessimism in the news exert downward pressure on exchanges. Nevertheless, a price correction, induced by negative emotions, is relatively quickly recovered as the market once again focuses on fundamentals. Tetlock supports the noise trader theory, which is described in greater detail below. Broadly, it postulates that market actors do not trade exclusively on fundamental data, but there is a portion of traders who decide between buying and selling based on the news and market sentiment.

Contemporary economic literature shows that news influences individual expectations which, in turn, propel the stock indices. For that reason, the research field in question is of importance when it comes to studying fluctuations in stocks prices. The process of surveying, analyzing and evaluating stock exchange sentiment is called sentiment analysis. This branch of research aspires to transform the news into numerical values using text and data mining approaches [21].

Ad hoc publications usually contain information about topics such as dividends, profit warnings, management changes or acquisition and divestiture activities. To duly evaluate publicly listed enterprises, analysts study all available information. Every public company publishes consolidated financial statements yearly, business reports quarterly, and many other releases about management, M&A and business strategy on an event-driven basis. One of the greatest challenges in terms of sentiment analysis, therefore, is to evaluate this variety of textual data [20].

Two factors that have promoted growth in this field of research are a tremendous increase in readily available electronic information due to the expansion of the Internet and advances in textual analysis techniques, which have made it possible to effectively study large bodies of textual information [20]. The intricacies of text analysis are a direct consequence of the polyvalent meanings of words when surrounded by other words. If a publication’s title bears the name of a serious disease, it is weighted negatively. Yet, if the same disease is contextually related to a new drug developed by a company, the news item must be evaluated positively [22]. The value and relevance of the real-time analysis of data is evident and will doubtless reshape research methods in all branches of science [36]. In economics, sentiment analysis could soon contribute significantly to our understanding of economic developments.

2.2 Economic Indicators

Companies make investment decisions on the basis of their estimates of the future economic situation. If they anticipate an upswing, investments are likely to increase more – ceteris paribus – than if they predict a recession in which revenues would be lower. In every meeting held to discuss interest rates, central bankers should take into consideration how different macroeconomic aggregates will behave [2]. As realizations and expectations of inflation involve a larger number of determinants than merely GDP and unemployment – e.g. capacity utilization rate, home and asset prices, etc. – it is difficult to recognize which variables will end up in the central bank’s model [6].

As Hayek states, markets are the most efficient ways of aggregating dispersed information [15]. If all necessary information is fully reflected in the prices, a market is considered efficient [7,8,9]. Friedman further popularized this concept in financial research, formalizing it in the efficient market hypothesis (EMH) [8, 11]. But regarding the predictability of stock prices, there has been no unanimous point of view since the formulation of the EMH.

Logically, the EMH states that stocks are always fairly priced, and that it is impossible for market participants to score better returns than those of a benchmark index. With the throw of a dart, a chimpanzee would fare equally well (or poorly) as all educated stock market experts in naming a number of better-than-benchmark portfolios [24]. Malkiel demonstrates that the top 10 mutual funds that had beaten the returns of the S&P 500 in the 1960s, some by as much as double, lagged behind the same index in the 1970s [25]. Therefore, the current performance of the best mutual funds does not provide any clue about their future returns.

In “Random Walk Theory”, Malkiel argues that stock prices incorporate every piece of information available during the day [25]. The EMH presumes perfect competition and complete information, an assumption which is often criticized. Likewise, there is no consensus about how far ahead macroeconomic developments can be forecasted. Elliot and Timmermann [6] conclude that time series models are unstable with the lapse of time, and argue that economic forecast models should rather be perceived as approximations.

Economic forecasts can be based on surveys (Ifo index, ZEW economic outlook) or on real-time data (sentiment analysis). The latter method utilizes a different approach as contrasted with the two most followed indicators in Germany. The boom in the technical advancement of information systems has considerably enriched the palette of approaches accessible to analysts in their research [6]. Due to a steady supply of real-time data, the accuracy of models can be tested more rapidly and efficiently. Thus, their performance can be proved or refuted in a timely fashion. With econometric models, it is worth noting that correlations are susceptible to change over a study period as a result of modifications in the framework (regulations, institutions, technology, etc.) or relevant occurrences (terror attacks, natural disasters, etc.). Hence, this type of model should allow for instability in its parameters. In general, there doesn’t seem to be a single forecast model that produces reliable projections in practice [6].

Despite the constraints of the efficient market hypothesis, analysis of sentiment factors still offers a benefit that should not be underestimated. The efficiency of markets is crucially dependent on the speed of information processing. Data mining and text mining with regard to sentiment values make it possible to evaluate a news item almost at the moment of its release. In contrast, analysts and investors need considerably more time to grasp new information, evaluate it accordingly and effect a trade in the market [21]. Among other advantages, such a time edge makes sentiment analysis an approach worth pursuing.

3 Research Hypotheses

The sentiment of financial news is a potential measure of the optimism or pessimism of economic actors. Thus, it could serve as a leading indicator for other economic variables, such as stock indices, but also for macroeconomic indicators such as GDP or the unemployment rate. In this paper, we want to study the influence of sentiment on different economic measures in order to educate the selection of relevant economic indicators for a sentiment index, which we develop in a parallel paper. In this context, we develop the following hypotheses.

Since the dataset on market sentiment values consists of ad hoc news publications, there are reasonable grounds to assume that these news items exhibit a positive correlation with the CDAX, and that the respective sentiment values are capable of explaining the stock market performance. This hypothesis stems from the fact that the valuation of stocks is always influenced by market sentiment and expectations [30]. Siering followed a similar approach by testing whether positive news items have a greater impact on stock movements than negative ones [32].

Hypothesis 1 (H1): The constructed sentiment index is positively correlated with the performance of the CDAX.

Similar to the Ifo index and the ZEW economic outlook indicator, it is probable that monthly sentiment values demonstrate a relationship with economic development [18]. We additionally expect sentiment to explain macroeconomic indicators as our sentiment variable reflects the sentiment of all stock-relevant announcements from companies representing the leading German stock market, the CDAX. Therefore, this corporate news flow represents a major part of German business and is thus a good indicator for the pessimism or optimism prevalent among German companies. Our analysis will determine whether our sentiment variable is capable of explaining changes in macroeconomic variables (incoming orders, and unemployment level) and thus serve as a forecast indicator for fluctuations in economic activity.

Hypothesis 2 (H2): News sentiment correlates positively with the macroeconomic variable “incoming orders”.

Hypothesis 3 (H3): News sentiment correlates negatively with the macroeconomic variable “unemployment rate”.

4 Methodology

4.1 News Corpus

This section explains the choice of German ad hoc announcements in English as our underlying news corpus. To this end, we briefly recapitulate the corresponding publication regulations: to prevent insider trading and assure the equal availability of novel, stock-price-relevant information to all market participants, stock-listed firms are subject to disclosure regulations. Disclosure regulations must meet specific criteria regarding the content. However, the criteria do not regulate which words may be used. Thus, the choice of words (as the source of the conveyed sentiment) is left to the discretion of the news originator.

In Germany, the legal obligations ensure that companies disclose stock-price-relevant company information in the form of so-called ad hoc announcements [13, 14, 28]. The publications must include financial results, changes of top management, M&A transactions, major orders, dividends, and litigation outcomes. These regulated ad hoc disclosures are usually published in German and, more importantly for our analysis, in English. Each disclosure contains approximately 10–20 lines of free text. In contrast to the SEC-regulated publications in the US, companies in Germany have to file any new information immediately. This fact makes such disclosures highly relevant to stock market participants since the information they contain is definitively novel in nature [28]. In addition, ad hoc announcements need to be authorized by executives of the releasing company and the ad hoc filing is quality-checked by the Federal Financial Supervisory AuthorityFootnote 1. Several publications have assessed their importance to stock market reactions and suggested a direct relationship with stock market returns [28].

Our news corpus consists of German regulated ad hoc announcements from between January 2004 and June 2011.Footnote 2 As a requirement, each announcement must be written in English. In addition, we remove so-called penny stocks with a value below 5 Euro, since these react less systematically to financial disclosures. In total, our final corpus consists of 14,427 ad hoc announcements.

4.2 News Sentiment Analysis

Sentiment analysis refers to analytical methods that measure the positivity or negativity of the content of textual data sources. In this way, sentiment analysis can shed light on how human agents process and respond to the textual content of news.

Before investigating the differential information processing of news sentiment, we need to pre-process our news corpus according to the following steps:

  1. 1.

    Tokenization: first, tokenization splits running text into single words, named tokens [10, 26].

  2. 2.

    Negation inversion: We then account for negations using a rule-based approach to detect negation scopes and invert the meaning accordingly [4, 31].

  3. 3.

    Stop word removal: in a next step, we remove so-called stop words, which are words without relevance, such as articles and pronouns [19].

  4. 4.

    Stemming: finally, we perform stemming in order to truncate all inflected words to their stem using the Porter stemming algorithm.

After completing the pre-processing, we can study the influence of news sentiment on financial markets. For this purpose we choose the Net-Optimism metric [5] combined with Henry’s Finance-Specific Dictionary [16], since this is a common sentiment approach that leads to a robust relationship [23]. The Net-Optimism metric S(A) is given by the difference between the number of positive \(W_{pos}(A)\) and negative \(W_{neg}(A)\) words divided by the total number of words \(W_{tot}(A)\) of an announcement A. Let us introduce the variables denoting news sentiment formally by

$$\begin{aligned} \text {S(A)} = \frac{W_{pos}(A) - W_{neg}(A)}{W_{tot}(A)} \in [-1;+1]. \end{aligned}$$

In addition, we aggregate the sentiment on a monthly basis across all ad hoc announcements within a given month.

4.3 Methodological Approach

To identify possible correlations between the sentiment values and CDAX performance, we initially compare data on a firm-to-firm basis, aggregate it afterwards on daily and monthly scales, and then test it for interconnections. For the purpose of demonstrating whether sentiment values are able to explain macroeconomic variables, a considerable amount of external data (new industrial orders, unemployment rate) is processed as well.

The first hypothesis is primarily validated through correlation tests and the OLS. The Durbin-Watson test and Breusch-Pagan statistics are applied to check for autocorrelation and heteroscedasticity. A regression corrected for the Newey-West test will produce more reliable results. Furthermore, by adding monthly lags, we estimate the extent to which sentiment values could be an early indicator for shifts in the dependent variables. The sentiment values, augmented by time lags and then analyzed, is treated as a control variable for the CDAX.

Hypotheses H2 and H3 are evaluated by Newey-West-corrected OLS regressions. The GDP, being the most important metric, cannot be a proper dependent variable because it is calculated quarterly. As an approximation of developments in GDP we use incoming orders. Another important macroeconomic indicator is the unemployment rate in Germany. These variables are examined for correlations with the sentiment index on a monthly basis. The dataset analyzed in this paper comprises a survey of ad hoc CDAX notifications from between 2004 and mid 2011. The CDAX, in contrast to the German DAX index, includes not only the 30 largest “blue chip companies”, but over 400 small and medium-sized businesses, which makes it a better representative of Germany’s economy. Constituents of the CDAX are all domestic stocks traded on the Frankfurt Stock Exchange, with foreign stocks falling outside the index range. Since the available data derives from German companies only, hypotheses H2 and H3 are tested only in relation to German macroeconomic developments.

5 Results

5.1 Descriptive Statistics and Correlation Analysis

The corresponding descriptive statistics and variables used in the regression models are provided in Table 1. Our approach is to first test the correlation between sentiment values and CDAX data on an individual level.

Table 1. Descriptive statistics of time series (January 2004–June 2011).

Table 2 displays correlations between news sentiment and the CDAX. The arithmetic mean is the aggregation method. The result indicates a weakening of the sentiment effect upon aggregation. Sentiment values are highly correlated with the CDAX on a daily basis (p-value < 0.001) and still correlated at a statistically significant level on a monthly basis (p-value < 0.05). Since the data for the macroeconomic indicators evaluated under hypotheses 2 and 3 is only available on a monthly basis, we refer in the following sections to monthly data. As the daily aggregated sentiment values of the ad hoc announcements of companies listed on the CDAX have a larger correlation coefficient and are statistically more significant (p-value smaller than 0.01) than monthly aggregated sentiment values (p-value smaller than 0.05), this suggests the stronger statistical inference of a sentiment index on a daily basis. We build on this finding in a parallel paper, which focuses on the implementation of a daily sentiment index [17].

Table 2. Correlations between news sentiment and CDAX.

5.2 The Effect of Sentiment on the CDAX

The following analysis tests whether the sentiment values are able to explain movements of financial markets (e.g. the CDAX).

Fig. 1.
figure 1

Normalized sentiment and CDAX on a monthly basis (Jan. 2004 = 100).

Elliot concludes that it is necessary to analyze not only the whole time frame, but also to consider relations between variables within specific periods if there are macroeconomic shocks or major changes in institutional and legal frameworks [6]. As the financial crisis of 2007/08 was a significant macroeconomic shock, our analysis is subdivided into two periods: 2004 – Dec. 2007 and Dec. 2007 – 2011. This division is based on the fact that December 2007 represents the official beginning of the financial crisis according to the National Bureau of Economic Research [29] and marked a considerable change in the macroeconomic environment.

Sentiment values range within a certain interval, too, and for purposes of convenience we normalize the first monthly value in 2004 at 100. Figure 1 is a graphic illustration of the concept, where sentiment values are always absolute numbers (black line), not percentage changes. For this reason, we will compare the sentiment values to the CDAX itself (blue line), not its oscillations in terms of percentage. Below is the OLS regression model with which the defined hypothesis will be tested:

$$\begin{aligned} C(t) = \beta _{0} + \beta _{1}S(t) + \beta _{2}\alpha {(t)} + \beta _{3}AR(t) + \beta _{4}CAR(t) + \beta _{5}PB(t) \end{aligned}$$
(1)

where C(t) is referred to as the CDAX. S(t) reflects the sentiment variable, our independent variable of interest. In addition, the model includes several control variables: The market alpha \(\alpha \)(t), abnormal returns AR(t), the cumulative abnormal return CAR(t) and the price-to-book-ratio PB(t). Results of autocorrelation and heteroscedasticity tests are listed in Table 3 and suggest that both effects are apparent in the dataset and that the initial regression should be corrected. Table 4 summarizes the products of the regression augmented by the Newey-West test to avoid autocorrelation and heteroscedasticity.

Table 3. Results of a test on autocorrelation and heteroscedasticity of certain variables.
Table 4. Results of a Newey-West corrected regression of sentiment index and CDAX for several time periods.

The analysis shows a statistically significant coefficient of sentiment for the period after the financial crisis (p-value < 0.001). If we take all six sentiment lags into account, we also get significant results for Lag 2 through 4, which corroborates the previous lag analysis presented in Table 4. The sentiment index is able to explain CDAX movements over the next 4 months. Hypothesis (H1) therefore cannot be rejected for the (post-) crisis observation period, whereas it has to be rejected for the previous time frame and also for the whole period of 2004–2011, since the results are not statistically significant within these time spans. An explanation for the results after the financial crisis may be that financial news was keenly watched due to the massive impact of collapsing markets and high volatility in the aftermath [3].

5.3 Sentiment Index and Macroeconomic Indicators

The Ifo index and ZEW economic outlook are said to be able to forecast certain economic developments. The prediction of the near future is defined as nowcasting. In this section, we test whether the news sentiment index possesses a similar nowcasting potential as the Ifo index and ZEW outlook [12, 27] by studying the effect of news sentiment on certain economic indicators.

Entrepreneurs usually make investment decisions based on forecasts of the future economic situation, whereas the current environment feeds into such predictions. If the economy is in a recession at present, businessmen tend towards pessimism in the short term and refrain from embarking on new projects. We also suppose the inverse: If the news and market sentiment are optimistic, businesses should be observed to be eager to invest. In macroeconomics, the most influential metric is, of course, the GDP of a country. However, this metric is estimated only quarterly, and thus we have to recourse to another economic indicator – incoming orders – which we will use as an approximation of German economic development. Tests are carried out on a monthly basis.

Fig. 2.
figure 2

Normalized sentiment, incoming orders in the industrial sector (green) and unemployment rate (blue). (Color figure online)

In principle, our analysis in this part follows the approach of Huefner [18], whereby the incoming industrial orders, IO(t), function as the dependent variable. Furthermore, the regression below will also check whether the sentiment index can explain fluctuations in the unemployment rate, UR(t). With independent variables remaining the same, the updated model may be seen below:

$$\begin{aligned} IO(t)/UR(t) = \beta _{0} + \beta _{1}S(t) + \beta _{2}\alpha (t) + \beta _{3}AR(t) + \beta _{4}CAR(t) + \beta _{5}PB(t) \end{aligned}$$
(2)

Building on Fig. 2, one may suppose a positive correlation between the two datasets. The sentiment values will be colored in black and macroeconomic variables in blue and green in all figures that follow. For the sake of convenience, the datasets have been normalized, and graphs start at 100 in 2004. It is logical to assume that our regression should be more successful at explaining incoming orders than, for example, industrial production. Due to purchasing and planning procedures, industrial production is always protracted. The outcome in Table 5 shows that we obtain slightly significant positive values for the sentiment index in column 5 (p-value < 0.05).

By adding control variables three to five, the sentiment index shows a significant effect on incoming industrial orders. Therefore, we cannot reject Hypothesis (H2).

Finally, we set the unemployment rate as the dependent variable. In this case, the coefficient for sentiment in the first column is significantly positive (P-value < 0.001), and remains significant by adding further control variables (Table 6). The adjusted \(R^{2}\) accounts for 22.6% of explained variation in residuals.

The result is rather counter-intuitive, as we expect unemployment to decrease when we observe a positive sentiment. A possible explanation for this phenomenon is that since the financial crisis, news sentiment has been rather negative, but the German unemployment rate did not increase as in most other European countries. Unlike many other European countries, Germany saw its unemployment plunge in the aftermath of the financial crisis. The political measures in the wake of the crisis did not lead to massive unemployment but, on the contrary, to a further decrease in unemployment, which might explain the positive rather than negative correlation in this analysis.

Table 5. Results of the Newey-West OLS regression for the relationship between the new industrial orders and the sentiment index.

The results of our analysis confirm that sentiment values can explain changes in the unemployment rate. However, we have to reject Hypothesis (H3), since the coefficient of sentiment is positive, and not negative as hypothesized.

Table 6. Results of the Newey-West OLS regression for the relationship between unemployment rate and the sentiment index.

This result seems counterintuitive at first glance, but may be due to the special situation of the German labor market during the observation period: unemployment was relatively high at the beginning of our observation period, when the news sentiment was also relatively high, as companies had not yet been affected by the financial crisis. Later on, the unemployment rate was kept at lower levels during the financial crisis due to various labor market measures in Germany that contained the threat of rising unemployment during the crisis. At the same time, the sentiment was low during this period.

6 Discussion and Implications for Future Research

Our results demonstrate the poly-valence of the sentiment derived from the financial disclosures of stock-listed companies. This sentiment measure has a statistically significant effect on various economic indicators, including the German CDAX stock market, as well as other macroeconomic indicators such as the level of new orders and the unemployment rate. For indicators, which are available on a daily resolution, it is recommendable to evaluate the relationship on a daily basis, as the effect of sentiment on stock markets is more pronounced when looking at narrower time windows.

Thus, future research on this topic may evaluate shorter time windows, e.g. daily data, when constructing a sentiment index for the CDAX. In addition, in order to provide a useful tool for practitioners, researchers should also focus on developing an early-warning system, which translates the sentiment index into a useful sentiment prediction tool. Within the broader horizons of our overarching research project, we have set out to refine the properties of such a sentiment index for stock markets [17].

7 Conclusion

Sentiment analysis can provide valuable insights into relevant information signals for investors, in addition to fundamental information sources. In comparison to other indices, a sentiment index benefits from faster processing of available information. We have also devised a method of creating an index from data on market sentiment, calculating it as an arithmetic mean of daily and monthly sentiment values. Our analysis reveals interesting results regarding the usage of a monthly sentiment index.

First, monthly sentiment values are able to explain movements in the leading German stock index, the CDAX, at statistically significant levels during and after the financial crisis, but not before. This confirms prior research in demonstrating that the perception of sentiment may be subject to shifts. Second, the constructed sentiment metric remains a statistically significant predictor of the CDAX for up to four time lags. Third, sentiment is also able to explain changes of relevant macroeconomic variables, namely the volume of incoming orders. Thus, our sentiment values explain both stock market movements and macroeconomic developments.