Opinion mining in management research: the state of the art and the way forward

Mukhopadhyay, Soumya

doi:10.1007/s12597-017-0328-3

Opinion mining in management research: the state of the art and the way forward

Theoretical Article
Published: 15 January 2018

Volume 55, pages 221–250, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

OPSEARCH Aims and scope Submit manuscript

Opinion mining in management research: the state of the art and the way forward

Download PDF

Soumya Mukhopadhyay ORCID: orcid.org/0000-0003-2882-2770¹

549 Accesses
5 Citations
Explore all metrics

Abstract

In the past decade, the explosive growth of social media has led to the emergence of a wide variety of information sources that can significantly impact individual level decision making processes. This has resulted in an increasing availability of unstructured textual data and automated evaluation of opinions, attitudes, and emotions has been accepted as an indispensable analytical tool in diverse domains. Consequently, there is a strong need to understand the underlying technical aspects of this emerging new field of analysis. In the current paper we address this need by reviewing the state of the art in sentiment analysis, summarize some of the important recent applications of sentiment analysis and offer future directions for further research. This paper differs from earlier reviews in a number of ways: first, it offers preliminary technical exposition of various techniques following a simple classification scheme so as to help potential future users to develop overall understanding of this rapidly developing field; second, it discusses in greater detail some of the more recently proposed techniques to solve a set of problems in specific management domains; third, it also presents some examples to elucidate how combining sentiment analysis techniques with conventional econometric approaches can help us solve business specific problems. The main goal of this paper is to generate more interest about this interesting new domain among management researchers.

How Reliable Is Sentiment Analysis? A Multi-domain Empirical Investigation

A Comprehensive Study of Sentiment Analysis in Big Data Applications

Many Facets of Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the advent of online technologies and social media, individuals are increasingly sharing their views and opinions through the Internet. Consequently, a wide variety of information sources are influencing individual level decision making process, and this in turn is affecting their behavior in business, socio-political and personal contexts. The increasing impact of public opinion is being felt by policy makers and business managers alike. As a result, monitoring and understanding the opinion dynamics and taking appropriate corrective actions are emerging as dominant activities that can make or mar the future of any entity having a public interface. The past one and half decades have witnessed the emergence of a number of techniques to detect, extract and classify opinions, sentiments and attitudes concerning different topics, from large structured and unstructured textual content. These techniques, often called sentiment analysis or opinion mining, essentially focus on offering automated means to study of people’s opinions, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes using computational and statistical approaches [1]. The sheer amount of data available from public sources makes it a formidable task to monitor and analyze the available information through purely manual efforts. Moreover, human intervention also introduces various types of cognitive biases into the analysis process. Objective automated opinion mining tools (e.g. natural language processing, polarity analysis, textual analysis etc.) can help us overcome these inherent cognitive limitations of manual approaches and make effective decisions.

Therefore, it is not surprising that automated sentiment detection systems have emerged as an indispensable tool in diverse domains in order to achieve wide variety of goals like assessment of political mood, development of market intelligence, determination of customer satisfaction, sales and business predictions. determination of financial market sentiments etc. [2,3,4]. For example, with the growth of e-commerce and review sharing platforms like Amazon, IMDB, epinions.com, TripAdvisor etc., business entities across the globe have started taking active interest in using sentiment analysis and text mining techniques to understand consumers’ requirements and design their offerings accordingly. Opinions contributed by consumers in blogs, forums, and product related websites are providing managers an avenue to learn consumers’ preferences, market trends and competitors’ reactions. Social networks (e.g. Facebook, Twitter etc.) are generating massive amount individual level information about billions of users, which can be used to develop algorithms that can predict behavioral patterns with extreme accuracy. A similar trend is sweeping the socio-political scene where online media is slowly displacing conventional communication channels [5, 6]. Common people are increasingly getting actively involved in issue based discussions and sharing their views in a way that can be accessed by millions of others. Emerging technologies are slowly creating an environment where individual level opinions are increasingly initiating changes at a global scale. Much of the information generated in the online environment contain highly unstructured textual data that often requires context specific treatments. Therefore, there is a need to understand various approaches that can be used to analyze and utilize such information. In the current paper we attempt to address this need by reviewing the state of the art in sentiment analysis, summarizing the recent trends and offering future directions for further research.

This paper differs from existing reviews [1, 7, 8] in a number of ways. First, it offers preliminary technical exposition of various sentiment analysis techniques following a simple classification scheme so as to help potential future users of these techniques. Second, rather than giving an abstract overview of a large number of approaches this paper discusses in greater detail some of the more recently proposed techniques to solve a set of problems in specific management domains. Third, it also presents some examples to show how combining sentiment analysis techniques with conventional econometric approaches can help us solve business specific problems.

2 Sentiment analysis

2.1 Definition

The main goal of opinion mining is to extract opinions from unstructured text using algorithmic, statistical or a mixture of both techniques. Sentiment analysis is essentially concerned with the following fundamental elements: the entity or the target that is being evaluated (e.g. a Hotel), the attribute of the target at which the opinion is directed (e.g. service quality or food), the polarity of the opinion (e.g. positive, negative or neutral), the opinion holder (e.g. the individual consumer) and the date when the opinion was emitted. Formally, an opinion is defined as a tuple $ \left( {e_{i} ,a_{ij} ,h_{k} ,s_{ijkl} ,t_{l} } \right) $ where $ e_{i} $ is the ith entity, $ a_{ij} $ denotes the jth aspect of entity $ e_{i} $ at which the opinion is directed, $ h_{k} $ is the kth opinion holder, $ s_{ijkl} $ is $ h_{k} $’S opinion polarity (or sentiment) towards aspect $ a_{ij} $ of entity $ e_{i} $ at time t_l [9]. An opinionated document $ d \in D $ contains opinions of a set of opinion holder’s opinions about a number of entities. Therefore, the main objective of sentiment analysis is to find all the opinion tuples $ \left( {e_{i} ,a_{ij} ,s_{ijkl} ,h_{k} ,t_{l} } \right) $ in a given document, across a set of documents or across many sets of documents. As stated earlier, the opinion polarity $ s_{ijkl} $ can be generally defined in terms of three levels: positive, negative or neutral.

2.2 Process flow

The basic process of sentiment analysis consists of a series of preliminary steps that includes data acquisition, text preprocessing and feature selection. These initial steps are followed by actual sentiment classification process. The overall process flow of a typical sentiment or opinion mining is described in Fig. 1.

2.2.1 Data acquisition

In the data acquisition phase, the corpus or the text documents are acquired using either an Application Programming Interface (API) or by employing Web Crawlers [10]. In the API based approach, data is collected using a broad range of clients (e.g. browsers and mobile devices) through appropriate HTTP protocols. These data acquisition strategies are easy to implement; however, they may have accessibility limitations (e.g. Twitter REST API imposes a rate limit of 180 requests/15-min). Moreover, many websites do not provide the API interface for public consumption. In contrast, crawler-based approaches offer virtually unrestricted access to topically-relevant content. However, the text collected through crawlers are often noisy and its data structure is prone to change with the change in the design of the target website. Moreover, the design of web crawlers must obey the etiquette protocols by minimizing the frequency and extent of overlap between consecutive requests. These requirements impose a constraint on how much data can be collected within a given span of time. Apart from these two most popular approaches, many firms also generate a large amount of opinionated unstructured text through their customer interfaces. For example, many independent e-Sellers (e.g. Myntra in India), online marketplaces (Flipkart, Amazon etc.) and intermediate aggregators (e.g. TripAdvisor) provide in-built interfaces through which customers can contribute their opinions and reviews about different products. These interfaces can generate a lot of opinionated textual corpus that can be analyzed through sentiment classification in order to generate business relevant knowledge.

2.2.2 Text pre-processing

Data acquisition is followed by a text sanitization step that is intended to clean up the text and format it in a manner that can be used for the subsequent steps of feature extraction and sentiment/opinion classification. The usual steps involved in this stage are as follows:

Language detection In this stage the language used in the text is detected and only the documents written in relevant language is identified and extracted from the collection of text documents or corpora [11]. This is important because every language has its own structural characteristics. Consequently, natural language models are generally specific to a particular language and these models require some extent of structural homogeneity for any meaningful analysis to be carried out.
Tokenization and lemmatization Most sentiment classification algorithms operate on a bag-of-words assumption whereby the document is represented as sparse vector of occurrence frequencies of its vocabulary (words). This assumption disregards the grammatical structure and word-orders within the document [12]. Accordingly, in the tokenization step, the document is broken down in terms of all the constituent words. Subsequently, each of these words are converted to its invariant canonical form or “stem” in the lemmatization step. This conversion is usually achieved through a morphological analysis but it can also be achieved through a less rigorous stemming heuristic (e.g. Porter’s stemmer [13]) to remove the word affixes.
Stopword removal Many words (e.g. the, that, of etc.) in any language (e.g. determiners, coordinating conjunction and prepositions in English) are used to impose a structure rather than to contribute meaningfully to the underlying expression and emotion. These words are called stopwords and they can be safely removed from a document without any significant detrimental impact on final opinion analysis [14, 15]. The process of removing these words is called stopword removal. This step can reduce the computational resources needed to analyze the corpora.
Part-of-speech (POS) tagging In this step each word is labeled with its associated part of speech (i.e. noun, verb, adjective, adverb and preposition). POS tagging is often useful for further processing like dependency parsing or machine learning. The identification of adjectives and adverbs are also sometimes useful for determining opinion polarity and subjectivity.

As the process flow suggests, the text pre-processing step is followed by feature or aspect extraction and selection step. This step is presented below with a specific focus on the approaches that have been most frequently used in past research works.

2.2.3 Feature extraction/selection

The definition of sentiment classification (presented in Sect. 2.1) suggests that the main task of this process is to identify the polarity of an opinion held by opinion holder ($ h_{K} $) that is targeted towards a specific (jth) aspect $ a_{ij} \in \left\{ {a_{i1} , \, a_{i2} , \ldots ,a_{iJ} } \right\} $ of target entity $ e_{i} $. Therefore, one of the primary tasks is to identify these aspects. The feature selection or aspect identification process is mainly concerned with this goal. Most of these approaches aim to identify a smaller subset of variables that can efficiently describe the underlying characteristics of the input data. The main approaches of feature selection are as follows.

Criterion based approaches

In these approaches, words are scored based on a suitable ranking criterion based on their relevance. Subsequently, all words with a score below a predefined threshold are removed. Therefore, in this approach, a feature selection criterion must be determined which can measure the relevance of each word with reference to output classes/labels. One way to define the relevance of a word is in terms of its conditional dependence, i.e. a word that is conditionally independent of the output class labels, can be considered irrelevant. Therefore, an appropriate ranking criterion can be expressed in terms of the interdependence between the class labels and the word under consideration. Accordingly, the most frequently used selection criteria can be based on correlation, pairwise mutual information and Chi square estimate [16]. As expected, each of these criteria essentially measures the extent of dependency between the class labels and the target variable or word. The correlation based criterion is defined as follows:

$$ R(i) = \frac{{\text{cov} (x_{i} ,Y)}}{{\sqrt {\text{var} (x_{i} ) \cdot \text{var} (Y)} }}, $$

(1)

where, $ x_{i} $ is the $ i{\rm th} $ variable (word), Y is the output (class labels), $ cov() $ indicates the covariance and $ var() $ denotes the variance. The next important criterion for feature selection approach uses is Chi square statistic as a measure of dependency between the class labels and the target word. The Chi square statistic of the word between word x and class k is defined as:

$$ \chi^{2} = \frac{{n \cdot F(x)^{2} \cdot \left( {p_{k} (x) - P_{k} } \right)^{2} }}{{F(x) \cdot \left( {1 - F(x)} \right) \cdot P_{k} \cdot (1 - P_{k} )}}, $$

(2)

where n, Total number of documents; $ p_{k} (x) $, The conditional probability of class k for documents containing word x; $ P_{k} $, Overall fraction of documents containing class k; $ F(x) $, Overall fraction documents containing word x.

The pairwise mutual criteria also measure mutual dependency between two variables in an information theoretic manner. Considering the Shannon entropy or the uncertainty associated with the output Y is defined as $ H(Y) = - \sum {p(y)\log (p(y))} $, the corresponding conditional entropy can be expressed as $ H(Y|X) = - \sum\nolimits_{x} {\sum\nolimits_{y} {p(x,y)\log (p(y|x))} } $. Consequently, we can measure the extent of uncertainty reduction ($ I(Y,X) $) in output Y by observing variable X as $ I(Y,X) = H(Y) - H(Y|X) $. This provides a way to estimate the point-wise mutual information $ \left( {M_{k} (x)} \right) $ as the ratio of true co-occurrences $ \left( {F(x) \cdot p_{k} (x)} \right) $ of class k and word x and their expected mutual co-occurrences $ \left( {P_{k} (x) \cdot F\left( x \right)} \right) $. Hence, we can write:

$$ M_{k} (x) = \log \left( {\frac{{p_{k} (x)}}{{P_{k} (x)}}} \right). $$

(3)

The point-wise mutual information varies between − 1 to 1 depending on the extent of dependency between Y and x. A distance based measure of mutual information can also be derived using the Kullback–Leibler divergence between two densities defined by the probability density functions $ f( \cdot ) $ and $ g( \cdot ) $:

$$ K(f,g) = f(y)\log \left( {\frac{f(y)}{g(y)}} \right). $$

(4)

The ranking criterion based approach of feature selection is computationally less demanding and it does not suffer from the problem of overfitting. However, it suffers from the problem of redundancy, in that it often identifies subsets with more than optimum number of variables that can adequately describe the underlying characteristics of the input data.

Latent semantic analysis

Latent semantic analysis is an unsupervised learning technique that aims to uncover underlying similarity among structures by first creating a rectangular term-document matrix ($ X_{t \times d} $) from a large collection of text where the rows represent individual words, columns represent the document and individual cells show the frequency with which a specific term occurs in a document. Subsequently, these frequencies are transformed into an inverse document frequency or entropy-based score and a reduced-rank or truncated singular value decomposition (i.e.$ X \sim T_{k} \times S_{k} \times D_{k}^{T} $) is applied on this matrix. The k largest singular values and their associated vectors generated by the Singular Value Decomposition (SVD) process are retained so as to represent each document-term as a k-dimensional vector in the derived space. Specifically, the rows in $ T_{k} $ represent the term vectors and the rows in $ D_{k} $ represent the document vectors in a reduced latent semantic space. Finally, similarities among entities (e.g. document–document, term–term and term–document) are computed in this reduced-dimensional space. In effect, this method transforms the text space to a new axis system that can explain the variations in the underlying attributes values in terms of a linear combination of the original word features. The main disadvantage of latent semantic analysis is that it may not necessarily discover those features that would lead to the best separation of the underlying document class-distributions.

2.2.4 Sentiment analysis

Once the data has been collected, pre-processed and a set of appropriate features or aspects have been identified, sentiment analysis can be applied to find out the opinion polarity $ s_{ijkl} $. Sentiment analysis is essentially a process of classifying a given text into two or more (e.g. “positive/negative” or “thumbs up/thumbs down”) opinion categories. It can also take the form of ordinal outputs such as number “stars” for a product etc. Moreover, the opinion determination can happen at various levels of granularity such as words, sentences and documents. Sentiment classification can be performed using either lexicon-based based or machine learning (ML) approaches. The Lexicon-based approach requires human interventions to create a set of annotated seed words that can be used in a subsequent bootstrapping method that relies on synonym detection algorithms to create a larger lexicon. This collection of known and precompiled sentiment terms called a sentiment lexicon. Subsequently, either the strength or the probability of occurrence of a sentiment word can be used for sentiment classification. The first approach is called a dictionary-based approach while the second type of methods are called corpus-based approach. However, it must be noted that, the much needed manual intervention makes lexical based approaches costlier and less appropriate for most of the large-scale sentiment mining exercises. The Machine Learning (ML) approach, on the other hand, uses linguistic features of the text and it can further be categorised based on whether it uses annotated data for training of the classifier (Supervised ML) or not (Unsupervised ML). Following Medhat et al. [17], a broad description of available classification schemes is presented in Fig. 2. We describe each of these approaches in greater detail in subsequent subsections.

Machine learning based sentiment classification

Supervised ML approaches The supervised machine learning sentiment classifiers can be categorized into two broad classes: linear classifiers and probabilistic classifiers. Under the linear classifier category, we have support vector machine and neural network based approaches. On the other hand, under probabilistic classifier category, there are three major approaches available: Naïve Bayes classifiers, Bayesian networks and maximum entropy based classifiers.

Support vector machines (SVM) SVM’s were developed from Statistical Learning Theory [18]. This is essentially a class of linear algorithms that tries to find out a hyperplane that can optimally separate out the data into two more classes. For a given n-dimensional input vector $ \vec{x}_{i} = (x_{i1} ,x_{i2} , \ldots ,x_{in} ) $, a weight vector $ \vec{w} = (w_{1} ,w_{2} , \ldots ,w_{n} ) $ and an output value $ y_{i} $, the derived hyperplane can be defined as:

$$ \vec{w} \cdot \vec{x}_{i} - b = 0. $$

(5)

The weight vector $ \vec{w} = (w_{1} ,w_{2} , \ldots ,w_{n} ) $ is determined using an appropriate training process, and given these weights, the classification of a new input vector $ \vec{x}_{i} $ can be carried out based on whether $ \vec{w} \cdot \vec{x}_{i} - b \ge 0 $ or not. We want to maximize the distance between the parallel hyperplanes (or margins) by choosing appropriate values of w and b (Fig. 3). These hyperplanes can be described by the equations:

$$ \begin{aligned} \vec{w} \cdot \vec{x}_{i} - b = 1,\;{\text{and}} \hfill \\ \vec{w} \cdot \vec{x}_{i} - b = - 1 \hfill \\ \end{aligned} $$

(6)

In order to have sufficient separation between the identified classes, we additionally impose the following constraint:

$$ \vec{w} \cdot \vec{x}_{i} - b\begin{array}{*{20}l} { \ge 1} \hfill & {{\text{if}}\quad y_{i} = + 1} \hfill \\ { \le - 1} \hfill & {{\text{if}}\quad y_{i} = - 1} \hfill \\ \end{array} . $$

(7)

Learning the SVM can now be formulated as the quadratic optimization problem subject to linear constraints where there is a unique minimum:

$$ \mathop {\hbox{min} }\limits_{w} \frac{1}{2}\left\| w \right\|^{2} {\text{subject}}\;{\text{to}}\;y_{i} \left( {\vec{w} \cdot \vec{x}_{i} } \right) - b \ge 1\quad {\text{for}}\;i = 1,2, \ldots N. $$

(8)

Considering a corresponding Lagrangian multiplier $ \alpha_{i} $, the necessary conditions for this optimization process are $ \frac{\partial L(\alpha ,w,b)}{\partial w} = w - \sum {\alpha_{i} y_{i} x_{i} } = 0 $ and $ \frac{\partial L(\alpha ,w,b)}{\partial b} = \sum {\alpha_{i} y_{i} } = 0 $. Therefore, we get:

$$ \mathop {\hbox{min} }\limits_{w,b} \left\{ {L(\alpha ,w,b)} \right\} = \sum {\alpha_{i} - \frac{1}{2}} \sum\limits_{i,j} {\alpha_{i} \alpha_{j} y_{i} y_{j} x_{i}^{T} x_{j} } = \tilde{L}(\alpha ). $$

(9)

Therefore, we need to find $ \mathop {\hbox{max} }\limits_{\alpha } \tilde{L}(\alpha )\, $ subject to following conditions: $ \alpha_{i} \ge 0 $ and $ \sum {\alpha_{i} y_{i} = 0} $. The corresponding values of w is given by $ \sum {\alpha_{i} y_{i} x_{i} } $ and $ b = w \cdot x_{i} - y_{i} $ when $ \alpha_{i} \ge 0 $. The classical SVM system assumes that a single hyperplane that the dataset is linearly separable. For non-linear datasets, a Kernel function can be used to map the data to a higher dimensional space in which it is linearly separable. Subsequently, the classical SVM system can be used to construct a hyperplane in this higher dimensional feature space. In other words, the product $ x_{i}^{T} x_{j} $ in Eq. (9) can be replaced by an appropriate kernel function $ K(x_{i} ,x_{j} ) $ such as a polynomial kernel ($ K(x_{i} ,x_{j} ) = \, (x_{i}^{T} x_{j} + \, 1)^{p} $ for $ p > 0 $) or a Gaussian RBF kernel ($ K(x_{i} , \, x_{j} ) = \text{e}^{{ - \gamma \left\| {x_{i} - x_{j} } \right\|^{2} }} $ for $ \gamma > 0 $). Classical binary SVM systems can be also extended to handle multi-class classification problems as well. For further understanding the reader is encouraged to read some of classic texts in this area [18, 19].

Artificial neural networks (ANN) The fundamental concept of artificial neural networks (ANN) was developed by McCulloch and Pitts [20]. At a very basic level, an artificial neural network is represented by a number of interconnected nodes (neurons) that can take a set of inputs, process these inputs and generate a set of outputs (Fig. 4). The basic processing unit of these networks is a neuron that can generate neural impulses taking weighted sums of the input signals and transforming them by applying a transfer function (f). The processing of the input signals (stimuli) depends on the respective connection weights. The transfer function accommodates for any possible nonlinearities. The learning ability in this setup is achieved by modulating the weights through some predefined algorithms called learning algorithms.

Referring to Fig. 4, the unidirectional the signal flow from inputs $ x_{1} ,x_{2} , \ldots ,x_{n} $ leads to a neuron’s output signal flow (O) that is generated as follows:

$$ O = f\left( {\sum\limits_{i = 1}^{n} {w_{i} x_{i} } } \right), $$

(10)

where $ w_{i} $ is a weight vector and $ f( \cdot ) $ is the transfer function. The output O is classified according to a threshold (θ) based rule. Accordingly, we can define:

$$ O = f\left( {\sum\limits_{i = 1}^{n} {w_{i} x_{i} } } \right)\left\{ {\begin{array}{*{20}l} { = 0,\quad {\text{if}}\;w^{T} x \ge \theta } \hfill \\ { = 1,\quad {\text{if}}\;w^{T} x < \theta } \hfill \\ \end{array} } \right.. $$

(11)

Extending this idea of the basic processing unit (or neuron), the overall architecture of a neural network can be defined in terms of three interconnected layers: input layer, hidden layer and output layer (Fig. 5). In case of feed-forward networks, the signal flows strictly in the forward direction (i.e. from input to output units) only. Recurrent networks on the other hand allows feedback connections where signals can flow backwards. In most of the neural networks the connection weights are updated through different learning approaches: supervised, unsupervised and reinforcement learning. In supervised learning the network is fed with an input vector and a set of desired responses. Subsequently, the errors between the desired and actual response (produced by the neural network) for each node in the output layer are used to modify the connection weights. The best known examples of ANN are the perceptron algorithm, the delta rule, and the backpropagation algorithm. In case of unsupervised learning approach, instead of using a priori set of desired outputs, the ANN automatically discovers statistically salient features of the input vector. Finally, in case of reinforcement learning, the system learns through trials and identifies those actions that would maximize a reward signal. Once trained, the learned rules defined in terms of these estimated weights can be used to predict document class memberships.

Naïve Bayes classifier (NB) The Naïve Bayes classifier is the simplest and most commonly used classifier. Bayesian inference was first applied in text classification by Mosteller and Wallace [21]. Naïve Bayes classification model works with the Bag-of-Words (BOW) assumption that ignores the position of the word in the document. Given a document d and a set of classes $ c \in C $, this model tries to find out the class that has maximum posterior probability. Following the Bayes Rule, the fundamental problem of a NB classifier is to find out the most likely class that a document belongs to:

$$ \hat{c} = \mathop {\text{argmax}}\limits_{c \in C} P(c\left| d \right.) = \mathop {\text{argmax}}\limits_{c \in C} \frac{P(d|c)P(c)}{P(d)} = \mathop {\text{argmax}}\limits_{c \in C} P(d\left| c \right.)P(c). $$

(12)

A document d can be assumed to be defined by a set of key words $ w_{1} ,w_{2} , \ldots ,w_{n} $. Further assuming that the probabilities $ P\left( {w_{i} \left| c \right.} \right) $ are conditionally independent given the class c, the probability a document belongs to a class c is given by the class probability multiplied by the product of the conditional probabilities of each word for that class:

$$ \hat{c} = \mathop {\text{argmax}}\limits_{c \in C} P(w_{1} , \, w_{2} , \ldots , \, w_{n} \left| c \right.)P(c) = \mathop {\text{argmax}}\limits_{c \in C} P(c)\prod\limits_{w} {P(w\left| c \right.)} . $$

(13)

In other words, we can write:

$$ P(c)\prod\limits_{i} {P(w_{i} \left| c \right.)} = P(c)\prod\nolimits_{i}^{n} {\frac{{count(w_{i} ,c)}}{{\sum\nolimits_{i} {count(w_{i} ,c)} }}} = P(c)\prod\nolimits_{i} {\frac{{count(w_{i} ,c)}}{{V_{c} }}} . $$

(14)

Here $ count(w_{i} ,c) $ is the number of occurrences of word $ w_{i} $ in class c, $ V_{c} $ is the total number of words in class c and n is the number of words in the target document. Now, $ V_{c} $ being a constant for given training set, it can be taken outside to get $ P(c)\prod\nolimits_{i} {\frac{{count(w_{i} ,c)}}{{V_{c} }}} = \frac{P(c)}{{V_{c}^{n} }}\prod\nolimits_{i}^{n} {count(w_{i} ,c)} $. For any word absent from the training set, the conditional probability of the word can be replaced by 1. Based on Eq. (13), the maximum a posteriori decision rule or can be developed to assign a class label to each document based on their respective word frequencies.

Maximum entropy (ME) classifier The maximum entropy based approach is closely related to the Naïve Bayesian classification. In case of ME classifier, an indicator function (or joint feature) is defined for each word w and class c:

$$ f_{i} (d,c) = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\;d\;{\text{belongs}}\;{\text{to}}\;{\text{class}}\;c\;{\text{and}}\;{\text{contains}}\;{\text{word}}\;w} \hfill \\ {0,\quad {\text{otherwise}}} \hfill \\ \end{array} } \right. $$

(15)

The expected value of feature $ f_{i} $ with respect to the model $ p(d\left| c \right.) $ is given by:

$$ P(f_{i} ) = \sum\limits_{c,d} {\tilde{p}(d)p(} d|c)f_{i} (d,c) = \sum\limits_{c,d} {\tilde{p}(} d,c)f_{i} (d,c). $$

(16)

where $ \tilde{p}(d,c) $ is the empirical distribution of the training data and it is equal to $ \tilde{p}(d,c) = \frac{1}{N}(\eta ) $, η being the number of times $ (d,c) $ occurs in training dataset. Subsequently, a weight ($ \lambda_{i} $) is assigned to each of these joint features so as to maximize the log-likelihood of the training data. This weight assignment is carried out by using an iterative optimization algorithm. Given the feature weight vector (λ), the probability that a given document d belongs to class c is given by:

$$ p(c\left| d \right.,\lambda ) = \frac{{\exp \left( {\sum\nolimits_{i} {\lambda_{i} f_{i} (d,c)} } \right)}}{{\sum\nolimits_{c} {\exp \left( {\sum\nolimits_{i} {\lambda_{i} f_{i} (d,c)} } \right)} }}. $$

(17)

Once again, just like a Naïve Bayes classifier, this probability rule can help classify a given document by indicating its most probable class membership.

Bayesian networks (BN) In contrary to Naïve Bayes or Maximum Entropy classifiers, the Bayesian network model aims to capture the complete relationship structure among a set of variables in terms of their conditional dependencies and specify a complete joint probability distribution over all the variables. In general, given a Bayesian network with n nodes $ x_{1} ,x_{2} , \ldots ,x_{n} $, the corresponding joint probability distribution is given by: $ P(x_{1} ,x_{2} , \ldots ,x_{n} ) = \prod\nolimits_{i = 1}^{n} {P\left( {x_{i} \left| {P_{a} (x_{i} )} \right.} \right)} $, where $ P_{a} (x_{i} ) $ is the set of the probability distributions corresponding to the parents of node $ x_{i} $. Bayesian Networks can often be represented adequately using directed acyclic graphs. However, complete probabilistic treatments can prove to be computationally expensive for most the real-life sentiment mining problems. Although, directional separation or d-separation and Markov property assumptions can alleviate this issue to an extent, BN is still used very infrequently in opinion mining.

Unsupervised sentiment analysis (Bayesian topic sentiment models) It is evident from our discussion till now that sentiment analysis aims to classify documents into a set of predefined categories. Supervised techniques archive this by using a large number of pre-annotated training texts. However, creating pre-annotated training documents is an expensive and time consuming task. Moreover, documents created through human intervention inherently introduce biases into the training process. Moreover, many domain specific sentiment models often fail to produce satisfactory results in a different domain. Unsupervised learning tries to address these issues by minimizing dependency on annotated training data. However, using unsupervised approaches for sentiment analysis is a challenging task because the predictive accuracy of these models barely beat the chance baseline. Recently Bayesian approaches to unsupervised sentiment analysis have received a lot of attention, because these models allow the inclusion of prior information into the model estimation and can improve the prediction accuracy to a significant extent. Most of the Bayesian unsupervised sentiment analysis models rely on Latent Dirichlet Allocation [22] to jointly detect the underlying topics in a document and identify whether the semantic orientation of the given text is positive, negative or neutral.

Latent Dirichlet Analysis (LDA) is a generative probabilistic framework that automatically discovers hidden topics from the text by considering each document from a corpus to be a mixture of topics. Each topic, in turn, is assumed to follow a distribution over a fixed vocabulary terms. Accordingly, the LDA process can be expressed by a simple plate notation presented in Fig. 6. In this graphical representation α is a k-dimesional vector of symmetric Dirichlet priors, where k is the number of underlying topics, β denotes the conditional probabilities of words to topics, θ is a document specific vector of topic probabilities, w is the observed document specific vector of words and z is the document specific choice of topics for each word.

Consequently, the generative process can be described as follows:

For each document $ d \in D $ in collection D:
- Draw $ \theta \sim Dirichlet\left( \alpha \right) . $
For each word ($ w_{n} $) in a specific document d of length N:
- Draw a topic $ z_{n} \sim Multinomial\left( \theta \right) \, $
- Draw a word $ w_{n} $ from a multinomial probability $ p\left( {w_{n} |z_{n} ,\beta } \right) $ conditioned on the topic $ z_{n} $.

This generative process leads to a joint probability of $ {\varvec{\uptheta}}^{d} ,{\mathbf{z}}^{d} $ and $ {\mathbf{w}}^{d} $ given parameters $ \alpha \;{\text{and}}\;\beta $:

$$ p\left( {{\varvec{\uptheta}}^{d} ,{\mathbf{z}}^{d} ,{\mathbf{w}}^{d} |\alpha ,\beta } \right) \, = \, p\left( {{\varvec{\uptheta}}^{d} |\alpha } \right)\prod\limits_{n = 1}^{N} {p\left( {z_{n}^{d} |{\varvec{\uptheta}}^{d} } \right)p\left( {w_{n}^{d} \left| {z_{n}^{d} ;\beta } \right.} \right)} \, $$

(18)

Marginalizing out $ z\;{\text{and}}\;\theta $ leads to the probability of words in a document:

$$ p\left( {{\mathbf{w}}^{d} \left| {\alpha ,\beta } \right.} \right) = \int {p\left( {{\varvec{\uptheta}}^{d} \left| \alpha \right.} \right)\prod\limits_{n = 1}^{N} {\sum\limits_{{z_{n}^{d} }} {p\left( {z_{n}^{d} \left| {{\varvec{\uptheta}}^{d} } \right.} \right)p\left( {w_{n}^{d} \left| {z_{n}^{d} ;\beta } \right.} \right)d\theta^{d} } } } \, . $$

(19)

Consequently, the probability of all documents in the corpus is given by the product of marginal distributions:

$$ p\left( {D\left| {\alpha ,\beta } \right.} \right) = \prod\limits_{d = 1}^{D} {\int {p\left( {{\varvec{\uptheta}}^{d} \left| \alpha \right.} \right)\prod\limits_{n = 1}^{N} {\sum\limits_{{z_{n}^{d} }} {p\left( {z_{n}^{d} \left| {{\varvec{\uptheta}}^{d} } \right.} \right)p\left( {w_{n}^{d} \left| {z_{n}^{d} ;\beta } \right.} \right)d\theta^{d} } } } } \, . $$

(20)

Now, the posterior of the hidden variables θ^d and z^d for a given document d can be expressed in terms of Eqs. (18) and (19)

$$ p\left( {{\varvec{\uptheta}}^{d} ,{\mathbf{z}}^{d} \left| {{\mathbf{w}}^{d} } \right.,\alpha ,\beta } \right) = \frac{{p\left( {{\varvec{\uptheta}}^{d} ,{\mathbf{z}}^{d} ,{\mathbf{w}}^{d} \left| {\alpha ,\beta } \right.} \right)}}{{p\left( {{\mathbf{w}}^{d} \left| {\alpha ,\beta } \right.} \right)}}. $$

(21)

However, the denominator of the posterior is not tractable as both variables θ^d and z^d are latent in nature and consequently the posterior expectation cannot be calculated in a straight forward manner. Researchers have proposed a number of ways to solve this problem beginning with variational Bayes inference, where the generative model is expressed in terms of a simpler distribution without many dependencies by removing the edges between $ \theta , \, z $ and w (Fig. 7).

Based on this simplified notation, the approximate posterior distrbuition takes the following form:

$$ q^{d} \left( {\theta^{d} ,z^{d} \left| {\gamma^{d} ,\varphi^{d} } \right.} \right) = q\left( {\theta^{d} \left| {\gamma^{d} } \right.} \right)\prod\limits_{n = 1}^{N} {q\left( {z_{d}^{n} \left| {\varphi_{d}^{n} } \right.} \right)} , $$

(22)

where $ q( \cdot ) $ denotes an approximate posterior function and $ \varphi $ denotes the variational parameters. Using an EM (Expectation Maximization) algorithm the posterior can now be calculated by iterating through the following alternating steps:

1.
E(Expectation)-step: Find out the best approximate posterior function $ q^{d} \left( {\theta^{d} , \, z^{d} \left| {\gamma^{d} ,\varphi^{d} } \right.} \right) $ for each document
2.
M(Maximaization)-step: Maximize the bounds of $ q( \cdot ) $ function with respect to α and β

Although variational EM is the most frequently used algorithm for parameter estimation in any model related to LDA, still in reality it is an approximation method. Consequently, Gibbs based samplers have been proposed recently for estimating LDA [23]. Gibbs approach is a Markov chain Monte Carlo (MCMC) algorithm for sampling a sequence of approximate observations from a specified joint probability distribution of two or more random variables $ p({\mathbf{z}}) = p\left( {z_{1} ,z_{2} , \ldots ,z_{n} } \right) $. The process begins by initializing state values for $ \left\{ {z_{i} :i = 1, \ldots ,N} \right\} $ and then iterating through a sampling process where each variable $ z_{i} $ is sampled from its conditional distribution with reference to the remaining variables i.e. $ p\left( {z_{i} \left| {z_{ - i} } \right.} \right) $. This procedure is repeated a number of times until the samples begin to converge to the true target distribution.

The Gibbs sampling approach to LDA aims to find the latent document specific topic portions ($ \theta_{d} $), the topic specific word distributions ($ \varphi $), and the probability of topic z being assigned to a word $ w_{i} $ denoted by $ z_{i} $. However, these topic index assignments ($ z_{i} $) are sufficient to find out both $ \theta_{d} $ and $ \varphi $. Therefore, in principle these parameters ($ \theta_{d} $ and $ \varphi $) can be integrated out so as to focus on just computing the topic indexes ($ z_{i} $) given all other topic assignments to all other words. This kind of Gibbs sampling scheme is called a collapsed Gibbs sampler (with both $ \theta_{d} $ and $ \varphi $ being “collapsed out”). Therefore, denoting all topic allocations other than for $ z_{i} $ by $ z_{ - i} $, the following posterior probability can be defined up to a normalizing constant:

$$ p\left( {z_{i} \left| {{\mathbf{z}}_{ - i} } \right.,\alpha ,\beta ,w} \right) = \frac{{p\left( {z_{i} ,{\mathbf{z}}_{ - i} ,{\mathbf{w}}\left| {\alpha ,\beta } \right.} \right)}}{{p\left( {{\mathbf{z}}_{ - i} ,{\mathbf{w}}\left| {\alpha ,\beta } \right.} \right)}} \propto p\left( {z_{i} ,{\mathbf{z}}_{ - i} ,w\left| {\alpha ,\beta } \right.} \right) = p\left( {{\mathbf{w}},{\mathbf{z}}\left| {\alpha ,\beta } \right.} \right). $$

(23)

Moreover,

$$ \begin{aligned} p\left( {{\mathbf{w}},{\mathbf{z}}\left| {\alpha ,\beta } \right.} \right) \hfill \\ = \iint {p\left( {{\mathbf{z}},{\mathbf{w}},\theta ,\varphi \left| {\alpha ,\beta } \right.} \right)d\theta d\varphi } = \iint {p\left( {\varphi |\beta } \right)p\left( {\theta |\alpha } \right)p\left( {z|\theta } \right)p\left( {w\left| {\varphi_{z} } \right.} \right)d\theta d\varphi } \hfill \\ = \int {p\left( {z\left| \theta \right.} \right)} p\left( {\theta \left| \alpha \right.} \right)d\theta \int {p\left( {w\left| {\varphi_{z} } \right.} \right)} p\left( {\varphi \left| \beta \right.} \right)d\varphi . \hfill \\ \end{aligned} $$

(24)

Both the terms in Eq. (24) being multinomial with Dirichlet prior, using conjugacy property it can be shown that

$$ p\left( {{\mathbf{w}},{\mathbf{z}}\left| {\alpha ,\beta } \right.} \right) = \prod\limits_{d} {\frac{{B\left( {n_{d,\cdot} + \alpha } \right)}}{B\left( \alpha \right)}} \, \prod\limits_{k} {\frac{{B\left( {n_{k,\cdot} + \beta } \right)}}{B\left( \beta \right)}} ,{\text{and}} $$

(25)

$$ \begin{aligned} p\left( {z_{i} \left| {{\mathbf{z}}^{{\left( { - i} \right)}} } \right.,{\mathbf{w}}} \right) & = \frac{{p\left( {{\mathbf{w}}\text{,}{\mathbf{z}}} \right)}}{{p\left( {{\mathbf{w}}\text{,}{\mathbf{z}}^{{\left( { - i} \right)}} } \right)}} = \frac{{p\left( {\mathbf{z}} \right) }}{{p\left( {{\mathbf{z}}^{{\left( { - i} \right)}} } \right)}} \cdot \frac{{p\left( {{\mathbf{w}}\left| {\mathbf{z}} \right.} \right)}}{{p\left( {{\mathbf{w}}^{{\left( { - i} \right)}} \left| {{\mathbf{z}}^{{\left( { - i} \right)}} } \right.} \right)p\left( {w_{i} } \right)}} \\ & \propto \left( {n_{d,k}^{{\left( { - i} \right)}} + a_{k} } \right) \frac{{n_{k,w}^{{\left( { - i} \right)}} + \beta_{w} }}{{\sum\nolimits_{w'} {n_{{k,w^{\prime } }}^{{\left( { - i} \right)}} + \beta_{{w^{\prime } }} } }}, \\ \end{aligned} $$

(26)

where k represents the topic, $ n_{d,k} $ indicates the number of words assigned to topic k in document d and $ n_{k,w} $ denotes the number of times word w is assigned to topic k. Once the Gibbs sample is finished, the counts can be used to compute the latent distributions $ \theta_{d} $ and $ \varphi_{k} $. This basic framework has been extended in many ways to incorporate sentiment analysis. Joint Sentiment Topic model (JST) is one such framework that tries to model the word generation for positive or negative sentiment conditioned on topics [24]. The plate diagram for JST model is presented in Fig. 8. Compared to some of the existing semi-supervised methods JST shows significant performance gain of between 10% to 20%.

Denoting a collection of D documents ($ d_{1} ,d_{2} , \ldots ,d_{D} $) that are each associated with a sequence of words $ (w_{1} ,w_{2} , \ldots ,w_{{N_{d} }} ) $ where each word is considered to be from a vocabulary of distinct words $ 1,2, \ldots ,V $, JST tries to model S distinct sentiments and T topic labels simultaneously. The corresponding generative mechanism can be described as follows [24]:

For each sentiment label $ l \in \left\{ {1, \ldots ,S} \right\} $, for each topic $ j \in \left\{ {1, \ldots ,T} \right\}.$, draw $ \varphi_{lj} \sim Dirichlet\left( {\lambda_{l} \times \beta_{lj}^{T} } \right) $.
For each document d, choose a distribution $ \pi_{d} \sim Dirichlet(\gamma ) $
For each sentiment label lunder document d, choose a distribution $ \theta_{d,l} \sim Dirichlet(\alpha ) $.
For each word $ w_{i} $ in document d
- Choose a sentiment label $ l_{i} \sim Multinomial\left( {\pi_{d} } \right) $
- Choose a topic $ z_{i} \sim Multinomial\left( {\theta_{d} ,l_{i} } \right) $
- Choose a word $ w_{i} $ from a Multinomial distribution over words conditioned on topic $ z_{i} $ and sentiment label $ l_{i} $ denoted by $ \varphi_{{l_{i} z_{i} }} $
.

A very similar approach is also taken by Aspect Sentiment Unification Model (ASUM) that, in contrast to JST, focuses on regional co-occurrence of the words in a document by imposing a constraint that all the words in a given sentence must originate from the same language model [25]. In some sense both ASUM and JST are semi supervised in nature in that they use a small set of sentiment seed words. There are several other models (e.g. Topic Sentiment Mixture (TSM) model, Multi-Aspect Sentiment (MAS) model etc.) that extend the basic LDA model to model topic and sentiment together [14, 26]. Very recently, researchers have proposed more advanced methods based on text-based hidden Markov models, that can use a sequence of words in training texts instead of a predefined sentiment lexicon to classify implicit opinions [27].

Apart from machine learning based and lexical sentiment classifications, the sentiment classification can be carried out using dictionary based and corpus based classification methods also. Therefore, for the sake of completeness, we briefly touch upon these two methods and cite some relevant literature.

Dictionary-based approach

The dictionary based approach begins with small set of seed words collected manually and recursively extends this initial word list by collecting related synonyms and antonyms from appropriate dictionaries e.g. WordNet [28]. This process continues till no new words can be found to be added to the seed word list. However, the main weakness of these family of methods is in their inherent inability to find domain and context specific opinion words. The Corpus-based approach tries to solve the problem of finding context specific opinion words by relying on the syntactic patterns of co-occurrences. In addition to WordNet, there are several other dictionaries which have been developed to examine specific aspects of human sentiment. Most prominent among them are: (1) Harvard General Inquirer [29], the oldest manually constructed word list organized into 17 distinct semantic categories; (2) SenticNet [30], an extension of WordNet consisting of words related to four emotional dimensions (sensitivity, aptitude, attention, and pleasantness) and their polarity; (3) the Valence Aware Dictionary for Sentiment Reasoning (VADER) [31], developed specifically for shorter texts found in social media contexts; (4) EmoLex [32] database that consists of word list related to particular emotions (e.g., anger, anticipation, disgust, fear, joy, sadness, surprise, and trust); (5) the Affective Norms for English Words (ANEW) database [33], that includes affective norms for valence, pleasure, arousal, and dominance; and (6) the SentiWordNet dictionary that enriches the WordNet word list with three sentiment scores: positivity, negativity, objectivity [34].

Corpus-based approach

Augmenting the word search process with linguistic constraints, corpus-based approach aims to find contextual opinion words and their opinion directions. The extraction of opinion words and their sentiment polarity is often facilitated by sequential learning algorithms such as Conditional Random Field (CRF) [35]. This approach is suitable for creation and visualization of comparative relation maps that are often used as an important tool in the area of enterprise risk management and decision making. The two broad categories of corpus based sentiment analysis are: statistical and semantic approaches. Statistical methods use a simple underlying principle that the frequency of occurrence of a word in an annotated text corpus can be a robust indicator of its polarity, i.e. words occurring more frequently in positive text are positive while words with higher frequency of occurrence in negative texts are negative. Those words which have equal frequency across positive and negative texts are neutral. On the other hand, semantic approach assigns similar sentiment polarities to words that are semantically similar. Corpus based approaches often adopt a mixture of tools such as pairwise mutual information (PMI) or its extensions (sematic oriented PMI), Hownet-based similarity measures, Latent Semantic Analysis (LSA), higher dimensional semantic spaces derived from lexical co-occurrence patterns (e.g. Hyperspace Analogue to Language or Sentiment Hyperspace Analogue to Language) etc. [36]. The corpus-based approach has been used in a wide variety of contexts due to its ability to identify the domain specific sentiments and their orientations, however, their dependency on corpus is an inherent weakness of such methods. Given this overview of the overall opinion mining process, we now move onto the next section that present a brief overview of some significant research works that have applied text/sentiment mining in a wide variety of managerial contexts.

3 Sentiment analysis in management research

3.1 Understanding market structure and customer perceptions

Understanding consumer sentiments regarding products and brands is an important area of research as such knowledge can be used by firms to design their product and service offerings, communication strategies and branding decisions. There is a large body of research that handles these tasks using econometric and psychometric tools. In most cases these tools rely on consumer responses collected through statistically robust survey designs, field studies, or experimental setups. However, increasingly user generated content is emerging as a dominant source of information that can be effectively used by firms to handle these tasks.

Given its strategic implications, a number of studies have examined various aspects of online opinion. These include how online evaluations affect demand [37], how such content gets created [38], and how firms should strategically respond to online consumer reviews [39]. Due to its informational richness, review text has been successfully used to discover key product attributes [40], to make product recommendations [41], and to determine market structure [42]. It has been demonstrated that by analyzing the linguistic cues associated with deception it is possible to weed out those product reviews which are written by reviewers who have not actually tried the concerned product [43]. Studies have also shown that even the affective content and linguistic style of the online text reviews can have a significant effect on conversion rates [44]. More recently, researchers have utilized online reviews to predict consumer’s purchase intention of durable goods [45] and to infer ratings of different product attributes [46]. User generated content from social networks such as Twitter has also been used for consumer insight mining [47].

Research in this domain has also examined consumers’ behaviors [2], role of social networks [48], and social influence [49]. Online textual content has been found to have significant impact on firm’s performance in terms of brand images, purchase intentions, sales, return on investment [50], and stock prices [51, 52]. More recently, researchers have focused on how brand specific sentiments differ across various types on social platforms [3, 52]. From a methodological perspective, modified LDA based topic models have been recently proposed that can effectively analyze unstructured consumer reviews and identify consumer opinions at a sentence level. These models have also been extended to accommodate opinion stickiness where the reviewer talks about the same topic over a number of consecutive sentences [53]. Such inertia across sentences needs to be handled properly as it violates the independent and identical topic distribution assumption of classical topic models such as LDA. Examination of interaction dynamics in virtual communities and their underlying emotional antecedents have also attracted some attention. Muniz and O’Guinn [6] find that consumers often choose to participate in communities whose members share their interests and opinions. In a virtual community the inclination to post a comment individual level goal (e.g. solving problems or helping others by offering technical advice) and motivations that can either be intrinsic satisfaction or social benefits such as reputation [54]. Such works are important because many of the social media has emerged as an important component of firm’s communication strategy.

3.2 Analysing financial sentiments

The concept of financial market sentiment has its root in the fundamental assumption that investors decisions are driven by their sentiments [55]. The market dynamics arise out of two factors: the transient sentiments of irrational traders (who are subject to exogenous sentiment) and the limited ability of the rational arbitrageurs to arbitrage. These limits arise because of several reasons: short time horizons, the costs associated with trading or short selling, and the fact that betting against irrational traders is inherently risky [56]. The vulnerability of the sentimental or irrational investors towards exogenous sentiment makes their behavior prone to outside information available from various sources such as financial news, press releases etc. Past research has shown that financial news can potentially affect the market [57] by impacting market returns [58, 59], intra-market volatility [60], and the profitability of different types of portfolios [61].

Analyzing financial sentiments using linguistic and opinion mining techniques has started drawing more research attention due to increasing acknowledgment of its prominent role in influencing market dynamics. In an early paper in this direction Knowles showed that the state of the financial market is often described in terms of health metaphors [62]. More recently, there has been work applying various sentiment analysis techniques to financial news analysis. Sentiments expressed in stock discussion boards have been found to affect price level in a technology stock index [63]. Specialized computational linguistics systems have been developed to predict stock prices and market volatility [64]. It has been found that the extent of pessimism in financial news column can significantly affect a company’s cash flow [65, 66]. Tetlock and colleagues [67] used a dictionary based approach for sentiment analysis to examine the relation between the Dow Jones Industrial Index and a pessimism index, specifically they utilized the General Inquirer Dictionary proposed by Stone et al. [68] in their studies. However, despite arguing that opinion sentiments can play an important role in determining the financial market dynamics the results from this research domain is not always unanimous or conclusive. Contrary to the findings of Tetlock [65], who suggest that textual sentiments can effectively supplants information related to a firms’ fundamentals, Tetlock et al. [67] finds that sentiment analysis does not provide any significant additional information. Li [69] also reports that stock market fail to reflect the textual information regarding firms’ future profitability available from the annual reports. Sinha [70] finds evidence that the stock market generally underreacts to the news sentiments. These contradictive results, however, are just an indication that our understanding of opinion dynamics in financial market is still incomplete and there is a strong need for further research in this field. In conclusion, it seems that textual information can informationally enrich and augment the conventional measures indicators of financial performance and play a significant role in determining market movements. Consequently, in agreement with the strongest from of efficient market hypotheses, a good financial market models might be justified to incorporate textual sentiment as an additional factor along with other firm-level characteristics.

3.3 Examining accounting practices

Understanding the textual information in corporate disclosures is important for financial accounting research. With significant advancement in the fields of computational linguistics, text mining, and machine learning in the past 2 decades accounting researchers have now access to powerful tools to understand financial disclosures and corporate communications better. These communications often indicate important managerial characteristics of the firm and thus have significant implications for understanding managers’ behavioral biases and predicting corporate decisions. This argument is supported by past research in this field that suggests that communication patterns during critical decision making processes can reveal critical organizational characteristics and indicate firm’s future performances. Recent findings also suggests that the level of optimism in earnings releases are positively associated with the market’s short-term response [71]. In general, past evidence indicates that the extent of pessimistic sentiment cues in the earnings disclosures is correlated with lower future return on assets [72], while, the optimism and certainty embedded in such announcements are positively associated with future earnings and expected earnings uncertainty [73]. However, negative sentiment prior to earning announcements has found to be inversely associated with earnings surprise [74]. Based on these findings, it seems fair to conclude that firms’ fundamentals often determine the textual sentiment in corporate disclosures and it does have the potential to offer additional information about firms’ future performance that cannot be captured completely by conventional quantitative measures. Consequently, incorporating textual sentiment as an additional covariate along with usual firm-level fundamentals can prove to be a fruitful avenue for future research.

4 Future directions

Till now, we have seen how sentiment analysis can be defined, what are the basic steps involved in the implementation of a sentiment analysis (SA) system, different approaches to extract sentiment from a corpus of opinionated texts, and a brief overview of various applications of sentiment analysis or opinion mining in business domain. Now, we discuss a few possible future extensions of the existing sentiment analysis techniques solve some very specific problems. In this section we touch upon managerial issues related to customer and competition along with an important methodological issue that need further attention.

4.1 Understanding customers

Conventionally the notion of customer driven marketing strategy development depends on various tools to understand the perceptions and need of customers. The success of these tools largely depends on input data that is collected through survey instruments. However, such data often suffer from various biases introduced by the survey tools. Therefore, an alternative approach would be to utilize data that is voluntarily contributed by consumers. Online reviews offer such data but in the form of an unstructured text. In specific, it can be argued that given a product specific textual review, we can ideally extract various product specific aspects and their respective sentiments from the text. Subsequently, the review specific overall sentiment $ r_{dl} $ associated with a review d contributed by reviewer l can be expressed as weighted sum of latent sentiments regarding various product specific aspects. In principle this is similar to a generative latent rating regression (LRR) model proposed by Wang et al. [15]. The aspect identification part can be executed using a Bootstrapping step that relies on Chi square based measure of dependencies between aspects and words. At the end of the aspect segmentation process each review d is associated with a word frequency matrix $ (w_{d} ) $ that gives the normalized frequency of words in each aspect. The LRR model treats $ w_{d} $ as independent variables and the overall rating r of the review as the dependent variable. Formally, an aspect sentiment rating $ \,X_{il} \, $ is determined as follows:

$$ X_{il} = \sum\limits_{j = 1}^{n} {\omega_{ijl} w_{dijl} } $$

(27)

where $ w_{dijl} $ represents the frequency of jth term belonging to ith aspect in the dth review contributed by lth reviewer and $ \omega_{ijl} $ represents the corresponding individual specific term weights. Similarly, the overall rating $ r_{l} $ associated with lth reviewer can be assumed to follow a Gaussian distribution. Thus we have

$$ r_{l} \sim N\left( {\sum\limits_{i = 1}^{p} {\beta_{il} X_{il} ,\sigma^{2} } } \right) $$

(28)

where $ X_{il} $ denotes the aspect (i) and subject (l) specific opinion rating and $ \,\sigma^{2} $ indicates the uncertainty of the overall rating predictions. This basic modelling framework can be easily extended to incorporate a model based segmentation approach using finite mixture models. Specifically, instead of assuming that all the overall ratings are generated from a population level multivariate normal distribution with mean $ \mu_{\beta } $ and covariance matrix $ \varSigma_{\beta } $ i.e.$ \beta_{l} \sim N_{p} \left( {\mu_{\beta } ,\varSigma_{\beta } } \right) $, we can assume that each customer l belongs to one of K segments. The distribution of parameter heterogeneity in segment $ k \in \{ 1,2 \ldots K\} $ assumed to follow a Gaussian distribution with mean $ \theta_{k} $ and variance- covariance matrix $ \varLambda_{k} $:

$$ P\left( {\beta_{l} \left| k \right.,\theta_{k} ,\varLambda_{k} } \right) = N\left( {\theta_{k} ,\varLambda_{k} } \right). $$

(29)

Segment membership is assumed to be unknown, while the prior probability of belonging to segment k is denoted by $ \psi_{k} $. In order to identify the model, the probabilities are ordered: the first segment is the smallest, and the last segment is the largest. This conceptualization induces a mixture model for the marginal distribution of $ r_{l} $. Denoting an identity matrix by I and integrating out $ \beta_{l} $, it can be shown that for a subject belonging to segment k:

$$ r_{l} = X_{l} \theta_{k} + \varepsilon_{l} (k),{\text{and}} $$

(30)

$$ V\left( {\varepsilon_{{^{l} }} (k)} \right) = \sigma^{2} I + X_{l} \varLambda_{k} X_{l}^{\prime } . $$

(31)

Following standard Bayesian estimation approach we further assume following priors for various parameters

$$ P\left( {\sigma^{2} \left| {r_{0} ,s_{0} } \right.} \right) = IG\left( {\sigma^{2} \left| {\tfrac{{r_{0} }}{2},\tfrac{{s_{0} }}{2}} \right.} \right) $$

(32)

$$ P\left( {\theta_{k} \left| {u_{0} ,V_{0} } \right.} \right) = N_{p} \left( {\theta_{k} \left| {u_{0} ,V_{0} } \right.} \right) $$

(33)

$$ P\left( {\varLambda_{k} \left| {f_{0} ,G_{0} } \right.} \right) = IW_{p} \left( {\varLambda_{k} \left| {f_{0} ,G_{0}^{ - 1} } \right.} \right),{\text{and}} $$

(34)

$$ P\left( {\psi \left| {W_{0} } \right.} \right) = ODir_{K} (\psi \left| {W_{0} } \right.). $$

(35)

The joint distribution of the proposed hierarchical Bayesian mixture model becomes:

$$ \begin{aligned} \prod\limits_{l} {N(r_{l} \left| {X_{l} \beta_{l} ,} \right.\sigma^{2} )} & \times \prod\limits_{l} {N_{p} (\beta_{l} \left| {\theta_{k} } \right.,\varLambda_{k} )} \times \prod\limits_{k} {N_{p} (\theta_{k} \left| {u_{0} } \right., \, V_{0} )IW_{p} (\varLambda_{k} \left| {f_{0} } \right., \, G_{0}^{ - 1} )} \\ & \times MN_{K} (N\left| \varPsi \right.)ODir_{K} (\varPsi \left| {W_{0} } \right.) \times IG(\sigma^{2} \left| {\tfrac{{r_{0} }}{2}} \right., \, \tfrac{{s_{0} }}{2}). \\ \end{aligned} $$

(36)

In above equations IG, IW, $ ODir $, MN represent inverse Gamma, inverse Wishart, Ordered Dirichlet and multinomial distributions respectively. Setting proper values initial values for prior parameters $ r_{0} , s_{0} ,u_{0} , V_{0} , f_{0} , G_{0} $ and $ W_{0} $, the estimation algorithm can now follow a standard Markov Chain Monte Carlo method such as a Gibbs sampler or a Metropolis–Hastings algorithm. The main objective of this extension is to show that sentiment analysis or a variant thereof, (i.e. in this case we don’t treat sentiments in the conventional sense of positive or negative orientation but as a much more granular rating expression) can be applied solve real life problems in customer management and marketing.

4.2 Developing competitive insight

Traditional corporate finance theories posit that firms faced with financial constraints—broadly defined as frictions that prevent firms from funding all desired investments have higher costs of external financing [75]. Financially constrained firms preserve internal finance to generate funds for future investment opportunities. Consequently, it can be argued that the way firms allocate funds across long term (forward looking) decisions (e.g. R&D, new product development, branding) and short term decisions (e.g. promotions) would also be dictated to a large extent by their ability to generate funds. In other words, the expected relative intensity of such decisions in these areas can be predicted beforehand if we can assess the extent of financial constraint a priori. At a very basic level, this idea can be implemented using the following linear model:

$$ z_{t} = \beta_{0} + \sum\limits_{j = 1}^{J} {\beta_{1j} y_{t - j} } + \sum\limits_{j = 0}^{J} {\sum\limits_{f = 1}^{F} {\beta_{2j}^{f} X_{t - j}^{f} } } + \sum\limits_{j = 0}^{J} {\sum\limits_{a = 1}^{A} {\beta_{3j}^{a} S_{t - j}^{a} } } + \varepsilon_{t} . $$

(37)

The dependent variable $ z_{t} $ denotes a latent performance indicator that the firm management can assess but unobserved by outside world (e.g. the competitor), $ X^{f} $ is a vector of control variables, and S is a vector of sentiment measures. This is a dynamic model that assumes that firm’s performance at time t depends on its prior performances for j time periods captured by $ y_{t - j} $, control variables $ X_{t - j}^{f} $ and sentiment terms $ S_{t - j}^{a} $. Where, both $ X_{( \cdot )}^{f} $ and $ S_{( \cdot )}^{a} $ can be contemporaneous or time lagged. The control variables can be various characteristics of the firm and market specific variables (e.g. cash flow from operations, the book-to-market ratio, the market value of equity, accruals and leverage, current earnings surprises, analyst earnings forecast revisions and dispersions, volatilities, stock market index returns, and trading volumes [76]). Now, defining $ \beta = \left\{ {\beta_{0} ,\vec{\beta }_{1} ,\vec{\beta }_{2} ,\vec{\beta }_{3} } \right\} $ and $ X_{t} = \left\{ {1,y_{t - j;j = 1,..j} ,X_{t - j;j = 1,..j}^{f} ,S_{t - j;j = 1,..j}^{a} } \right\} $ we can simply write:

$$ z_{t} = X_{t} \beta + \varepsilon_{t} . $$

(38)

Following Albert, Chib [77], the observed decisions (whether the firm engages in long-term investments or not) can therefore be expressed as:

$$ y_{t} = I(z_{t} > 0). $$

(39)

The corresponding Gibbs sampler involves iterative sampling of $ p(\beta |z) $ and $ p(z|\beta ) $. Dropping indices for notational simplicity and assuming a normal prior on the parameter β i.e. $ \beta \sim N\left( {\mu_{\beta } ,\varSigma_{\beta } } \right) $, the conditional distributions are:

$$ \begin{aligned} p(\beta |z) \propto p(z|\beta )p(\beta ) \\ \propto \exp \left[ { - \frac{1}{2}\left\{ {\left( {z - x_{t} \beta } \right)^{2} + \left( {\beta - \mu_{\beta } } \right)^{\prime } \varSigma_{\beta } \left( {\beta - \mu_{\beta } } \right)} \right\}} \right] \\ \propto \exp \left[ { - \frac{1}{2}\left\{ {\left( {\beta - \tilde{\mu }_{\beta } } \right)^{\prime } \tilde{\varSigma }_{\beta } \left( {\beta - \tilde{\mu }_{\beta } } \right) + z^{\prime } Pz} \right\}} \right] \propto N(\beta ;\tilde{\mu }_{\beta } ,\tilde{\varSigma }_{\beta } )N(z;0,P^{ - 1} ) \\ {\text{where}},\tilde{\varSigma }_{\beta } = \left( {\varSigma_{\beta }^{ - 1} + X^{\prime } X} \right)^{ - 1} ,\,\,\tilde{\mu }_{\beta } = \tilde{\varSigma }_{\beta } \left( {\varSigma_{\beta }^{ - 1} \mu_{\beta } + Xz} \right)\;{\text{and}}\;P = I_{T} - X\tilde{\varSigma }_{\beta } X^{\prime } . \\ \end{aligned} $$

(40)

$$ {\text{So}},p(z\left| {\beta ,y,X} \right.) \propto p(y\left| z \right.)p(z\left| \beta \right.,X) = \prod\limits_{t = 1}^{T} {p(y_{t} \left| {z_{t} } \right.)p(z_{t} \left| \beta \right.,x_{t} )} . $$

(41)

Similar approaches have earlier been used to examine whether textual sentiment can be used to predict occurrence or non-occurrence of specific events [78]. On the contrary we used a binary probit model to understand the impact of sentiment on firm’s decisions. Moreover, in contrast to conventional maximum likelihood based approach we have shown how such models can be solved using Bayesian estimation approach.

4.3 Going beyond sentiment: role of emotions

Contrary to the most frequent conceptualization of sentiment just in terms of positive and negative orientations, human emotions can be multidimensional in nature. Psychology literature has established the multifaceted nature of emotions that can assume various forms (Fig. 9) based not only on polarity but also on the level of arousal [79]. Recognising this limitation, many researchers have tried to broaden the scope of sentiment analysis by incorporating various emotions suggested by psychological literature.

Recently, Kim et al. [80] proposed an interesting approach where large set of emotions are assumed to be embedded in a low dimensional Euclidean space. Instead of the conventional binary (positive or negative) conceptualization of emotion in a document, they introduce a multivariate response variable that corresponds to a complex emotional state. Consequently, the discrete emotion label $ Y \in \left\{ {1,2, \ldots ,C} \right\} $ for given document X depends on the position Z on a continuous manifold $ Z \in R^{l} $ so that the distribution of Z given a specific emotion label is assumed to be Gaussian. In other words,

$$ p(Z\left| {Y = y} \right.) \sim N\left( {\mu_{y} ,\varSigma_{y} } \right). $$

(42)

Moreover, the distribution of Z given the document X (represented typically in a bag-of-words form) is assumed to be given by a linear regression model

$$ p(Z\left| {X = x} \right.) \sim N\left( {\theta^{T} x,\varSigma_{x} } \right). $$

(43)

They further assume that the distances between the vectors in $ E\left( {Z\left| {Y = y} \right.} \right) $ are similar to the respective distances in $ E\left( {X\left| {Y = y} \right.} \right) $. Consequently, the parameter $ \mu_{y} = E(Z\left| {Y = y} \right.);y \in C $ can be estimated by using either a multidimensional scaling or Kernel Principal Component Analysis (Kernel PCA) on $ \left\{ {E(X\left| {Y = y} \right.):y \in C} \right\} = \tfrac{1}{{n_{k} }}\sum\nolimits_{y(i) = k} {x(i)} $, where $ n_{k} $ is the number of documents belonging to category k. The estimate ($ \hat{\theta } $) for the parameter θ can be found by solving the regression model presented in Eq. (44) by using a maximum likelihood approach that gives

$$ \hat{\theta }\sim\mathop {\text{argmax}}\limits_{\theta } \sum\limits_{i} {\log \int_{Z} {p(z\left| {y^{(i)} } \right.)\frac{{p(y^{(i)} )p_{\theta } (z\left| {x^{(i)} } \right.)}}{{\sum\nolimits_{y} {p(z\left| y \right.)p(y)} }}} } dz. $$

(44)

Finally, the covariance matrices $ \varSigma_{y} $ can be estimated by computing the variance of Z values simulated from $ p_{{\hat{\theta }}} (z\left| {x^{(i)} } \right.) $ for all documents having the right labels $ Y^{\left( i \right)} = y $. Once estimated, the parameters $ \hat{\theta } $, $ \mu_{y} $ and $ \varSigma_{y} $ can now be used to predict the most likely emotion class membership of a new document using the following relationship:

$$ \mathop {\hat{y} = {\text{argmax}}}\limits_{\theta } \int {p(y\left| z \right.)p_{{\hat{\theta }}} (z\left| x \right.)} dx. $$

(45)

These models can be combined very successfully inside a temporal or dynamic econometric framework to track how the emotional states of a target group (e.g. consumers, voters etc.) are changing in response to various actions (e.g. campaign) and events (e.g. product failures). For example, there have been a number of recent cases where a product has suffered loss of reputation due to safety issues (e.g. Maggi in India, Samsung Note 7). The corresponding companies would be interested in tracking whether subsequent measures taken by them and the related post event campaigns have been able to successfully address the consumers’ concern and whether the corresponding brands/products have recovered from the loss of reputation they suffered or not. A multidimensional representation of the consumer’s mood can potentially be more effective in determining the exact communication strategy (in terms of its content and message) to be adopted.

5 Conclusion

The main objective of this paper has been to present how opinion or sentiment analysis has been used in management research. In contrast to some of the existing reviews that offer a wider theoretical coverage of this process, we adopted a more applied orientation and provided a structured description of the technical details associated with various steps of this process. Moreover, with specific examples drawn from three important domains of management (marketing, financial market and accounting), we demonstrated how these techniques have been used in past. Finally, we presented a number of examples pointing out where opportunities for further research exist. Given that this research area is expanding rapidly, it is virtually impossible to keep track of all the developments that is happening unless we are actively associated with this line of research. Hence, we feel there is a strong need for such works not only to keep researchers in other fields (e.g. management and social sciences) informed about the underlying concepts of these field but also to provide a more applied outlook of the possibilities that exist in more concrete manner. The main intention of this paper is thus to give the readers a consolidated understanding of the state of the arts for this rapidly growing field and to encourage them to think of possible applications in their field of choice.

References

Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, CC., Zhai, CX. (eds.) Mining Text Data, pp. 415–463. Springer, Berlin (2012)
Berger, J., Milkman, K.L.: What makes online content viral? J. Mark. Res. 49(2), 192–205 (2012)
Article Google Scholar
Schweidel, D.A., Moe, W.W.: Listening in on social media: a joint model of sentiment and venue format choice. J. Mark. Res. 51(4), 387–402 (2014)
Article Google Scholar
Toubia, O., Stephen, A.T.: Intrinsic vs. image-related utility in social media: why do people contribute content to twitter? Mark. Sci. 32(3), 368–392 (2013)
Article Google Scholar
Chmiel, A., Sienkiewicz, J., Thelwall, M., Paltoglou, G., Buckley, K., Kappas, A., Hołyst, J.A.: Collective emotions online and their influence on community life. PLoS ONE 6(7), e22207 (2011)
Article Google Scholar
Muniz, A.M., O’guinn, T.C.: Brand community. J. Consum. Res. 27(4), 412–432 (2001)
Article Google Scholar
Sun, S., Luo, C., Chen, J.: A review of natural language processing techniques for opinion mining systems. Inf. Fusion 36(Supplement C), 10–25 (2017)
Article Google Scholar
Piryani, R., Madhavi, D., Singh, V.K.: Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf. Process. Manag. 53(1), 122–150 (2017)
Article Google Scholar
Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 627–666 (2010)
Google Scholar
Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, Berlin (2014)
Book Google Scholar
Swanson, B., Charniak, E.: Native language detection with tree substitution grammars. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 2012, pp. 193–197. Association for Computational Linguistics
Peter, W.: The Porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)
Article Google Scholar
Porter, M.: Lovins revisited. In: Tait, J. (ed.) Charting a New Course: Natural Language Processing and Information Retrieval, pp. 39–68. Springer, University of Sunderland, Sunderland, UK (2005)
Titov, I., McDonald, R.T.: A joint model of text and aspect ratings for sentiment summarization. In: ACL 2008, pp. 308–316
Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2010, pp. 783–792. ACM
Hagenau, M., Liebmann, M., Neumann, D.: Automated news reading: Stock price prediction based on financial news using context-capturing features. Decis. Support Syst. 55(3), 685–697 (2013)
Article Google Scholar
Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Google Scholar
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Article Google Scholar
Mosteller, F., Wallace, DL.: Inference and disputed authorship: The Federalist. Reading, Massachusetts: Addison-Wesley Publishing Company, Inc., (1964)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence 2004, pp. 487–494. AUAI Press
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management 2009, pp. 375–384. ACM
Jo, Y., Oh, A.H.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 2011, pp. 815–824. ACM
Mei, Q., Ling, X., Wondra, M., Su, H., Zhai, C.: Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th International Conference on World Wide Web 2007, pp. 171–180. ACM
Kang, M., Ahn, J., Lee, K.: Opinion mining using ensemble text hidden Markov models for text classification. Expert Syst. Appl. 94(Supplement C), 218–227 (2018)
Article Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicogr. 3(4), 235–244 (1990)
Article Google Scholar
Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The general inquirer: a computer approach to content analysis. Am. J. Sociol. 73(5), 634–635 (1968)
Article Google Scholar
Cambria, E., Speer, R., Havasi, C., Hussain, A.: SenticNet: a publicly available semantic resource for opinion mining. In: Commonsense knowledge: papers from the AAAI fall symposium, pp. 14–18. Menlo Park, CA, USA: AAAI Press. 2010 AAAI Fall Symposium, Arlington, VA, USA (2010)
Gilbert, C.H.E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical Turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text 2010, pp. 26–34. Association for Computational Linguistics
Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. In: Technical Report C-1, the Center for Research in Psychophysiology, University of Florida (1999)
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2007)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In proceeding of the International Conference on Machine Learning (ICML), pp. 282–289 (2001)
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Article Google Scholar
Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43(3), 345–354 (2006)
Article Google Scholar
Moe, W.W., Schweidel, D.A.: Online product opinions: incidence, evaluation, and evolution. Mark. Sci. 31(3), 372–386 (2012)
Article Google Scholar
Chen, Y., Xie, J.: Third-party product review and firm marketing strategy. Mark. Sci. 24(2), 218–240 (2005)
Article Google Scholar
Lee, T.Y., BradLow, E.T.: Automated marketing research using online customer reviews. J. Mark. Res. 48(5), 881–894 (2011)
Article Google Scholar
Ghose, A., Ipeirotis, P.G., Li, B.: Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Mark. Sci. 31(3), 493–520 (2012)
Article Google Scholar
Netzer, O., Feldman, R., Goldenberg, J., Fresko, M.: Mine your own business: market-structure surveillance through text mining. Mark. Sci. 31(3), 521–543 (2012)
Article Google Scholar
Anderson, E.T., Simester, D.I.: Reviews without a purchase: low ratings, loyal customers, and deception. J. Mark. Res. 51(3), 249–269 (2014)
Article Google Scholar
Ludwig, S., de Ruyter, K., Friedman, M., Brüggen, E.C., Wetzels, M., Pfann, G.: More than words: the influence of affective content and linguistic style matches in online reviews on conversion rates. J. Mark. 77(1), 87–103 (2013)
Article Google Scholar
Bag, S., Tiwari, M.K., Chan, F.T.S.: Predicting the consumer’s purchase intention of durable goods: an attribute-level analysis. J. Bus. Res. (2017). https://doi.org/10.1016/j.jbusres.2017.11.031
Google Scholar
Kangale, A., Kumar, S.K., Naeem, M.A., Williams, M., Tiwari, M.K.: Mining consumer reviews to generate ratings of different product attributes while producing feature-based review-summary. Int. J. Syst. Sci. 47(13), 3272–3286 (2016)
Article Google Scholar
Rathan, M., Hulipalled, V.R., Venugopal, K.R., Patnaik, L.M.: Consumer insight mining: aspect based Twitter opinion mining of mobile phone reviews. Appl. Soft Comput. (2017). https://doi.org/10.1016/j.asoc.2017.07.056
Google Scholar
Goldenberg, J., Oestreicher-Singer, G., Reichman, S.: The quest for content: how user-generated links can facilitate online exploration. J. Mark. Res. 49(4), 452–468 (2012)
Article Google Scholar
Trusov, M., Bodapati, A.V., Bucklin, R.E.: Determining influential users in internet social networks. J. Mark. Res. 47(4), 643–658 (2010)
Article Google Scholar
Hoffman, D.L., Fodor, M.: Can you measure the ROI of your social media marketing? MIT Sloan Manag. Rev. 52(1), 41 (2010)
Google Scholar
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)
Article Google Scholar
Tirunillai, S., Tellis, G.J.: Does chatter really matter? Dynamics of user-generated content and stock performance. Mark. Sci. 31(2), 198–215 (2012)
Article Google Scholar
Büschken, J., Allenby, G.M.: Sentence-based text analysis for customer reviews. Mark. Sci. 35(6), 953–975 (2016)
Article Google Scholar
Chen, Y.-J., Kirmani, A.: Posting strategically: the consumer as an online media planner. J. Consum. Psychol. 25(4), 609–621 (2015)
Article Google Scholar
De Long, J.B., Shleifer, A., Summers, L.H., Waldmann, R.J.: Positive feedback investment strategies and destabilizing rational speculation. J. Finance 45(2), 379–395 (1990)
Article Google Scholar
Shleifer, A., Vishny, R.W.: The limits of arbitrage. J. Finance 52(1), 35–55 (1997)
Article Google Scholar
Klein, F.C., Prestbo, J.A.: News and the Market. H. Regnery Co., Washington (1974)
Google Scholar
Engle, R.F., Ng, V.K.: Measuring and testing the impact of news on volatility. J. Finance 48(5), 1749–1778 (1993)
Article Google Scholar
Melvin, M., Yin, X.: Public information arrival, exchange rate volatility, and quote frequency. Econ. J. 110(465), 644–661 (2000)
Article Google Scholar
Ederington, L.H., Lee, J.H.: How markets process information: news releases and volatility. J. Finance 48(4), 1161–1191 (1993)
Article Google Scholar
Chan, W.S.: Stock price reaction to news and no-news: drift and reversal after headlines. J. Financ. Econ. 70(2), 223–260 (2003)
Article Google Scholar
Knowles, F.: Lexicographical aspects of health metaphors in financial texts. In: Euralex96 Proceedings (Part II). Göteborg University, Göteborg, Sweden, pp. 789–796 (1996)
Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment extraction from small talk on the web. Manag. Sci. 53(9), 1375–1388 (2007)
Article Google Scholar
Antweiler, W., Frank, M.Z.: Is all that talk just noise? The information content of internet stock message boards. J. Finance 59(3), 1259–1294 (2004)
Article Google Scholar
Tetlock, P.C.: Giving content to investor sentiment: the role of media in the stock market. J. Finance 62(3), 1139–1168 (2007)
Article Google Scholar
Loughran, T.I.M., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1), 35–65 (2011)
Article Google Scholar
Tetlock, P.C., Saar-Tsechansky, M., Macskassy, S.: More than words: Quantifying language to measure firms’ fundamentals. J. Finance 63(3), 1437–1467 (2008)
Article Google Scholar
Stone, P., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The general inquirer: a computer approach to content analysis. J. Reg. Sci. 8(1), 113–116 (1968)
Article Google Scholar
Li, F.: Textual analysis of corporate disclosures: a survey of the literature. J. Account. Lit. 29, 143 (2010)
Google Scholar
Sinha, N.R.: Underreaction to news in the US stock market. Q. J. Finan. 6(02), 1650005 (2016)
Feldman, R., Govindaraj, S., Livnat, J., Segal, B.: Management’s tone change, post earnings announcement drift and accruals. Rev. Account. Stud. 15(4), 915–953 (2010)
Article Google Scholar
Davis, A.K., Tama-Sweet, I.: Managers’ use of language across alternative disclosure outlets: earnings press releases versus MD&A. Contemp. Account. Res. 29(3), 804–837 (2012)
Article Google Scholar
Demers, E., Vega, C.: Soft information in earnings announcements: news or noise? (2008) (Working Paper)
Chen, X., Cheng, Q., Lo, A.K.: Is the decline in the information content of earnings following restatements short-lived? Account. Rev. 89(1), 177–207 (2013)
Article Google Scholar
Lamont, O., Christopher, P., Jesus, S.-R.: Financial Constraints and Stock returns. Rev. Finan. Stud. 14(2), 529–554 (2001)
Article Google Scholar
Kearney, C., Liu, S.: Textual sentiment in finance: a survey of methods and models. Int. Rev. Financ. Anal. 33, 171–185 (2014)
Article Google Scholar
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Article Google Scholar
Rogers, J.L., Van Buskirk, A., Zechman, S.L.C.: Disclosure tone and shareholder litigation. Account. Rev. 86(6), 2155–2183 (2011)
Article Google Scholar
Watson, D., Tellegen, A.: Toward a consensual structure of mood. Psychol. Bull. 98(2), 219 (1985)
Article Google Scholar
Kim, S., Li, F., Lebanon, G., Essa, I.: Beyond sentiment: the manifold of human emotions. In: Artificial Intelligence and Statistics 2013, pp. 360–369

Download references

Author information

Authors and Affiliations

Indian Institute of Management, Ahmedabad, 380015, India
Soumya Mukhopadhyay

Authors

Soumya Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumya Mukhopadhyay.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukhopadhyay, S. Opinion mining in management research: the state of the art and the way forward. OPSEARCH 55, 221–250 (2018). https://doi.org/10.1007/s12597-017-0328-3

Download citation

Accepted: 27 December 2017
Published: 15 January 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s12597-017-0328-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Opinion mining in management research: the state of the art and the way forward

Abstract

Similar content being viewed by others

How Reliable Is Sentiment Analysis? A Multi-domain Empirical Investigation

A Comprehensive Study of Sentiment Analysis in Big Data Applications

Many Facets of Sentiment Analysis

1 Introduction

2 Sentiment analysis

2.1 Definition

2.2 Process flow

2.2.1 Data acquisition

2.2.2 Text pre-processing

2.2.3 Feature extraction/selection

2.2.4 Sentiment analysis

3 Sentiment analysis in management research

3.1 Understanding market structure and customer perceptions

3.2 Analysing financial sentiments

3.3 Examining accounting practices

4 Future directions

4.1 Understanding customers

4.2 Developing competitive insight

4.3 Going beyond sentiment: role of emotions

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Opinion mining in management research: the state of the art and the way forward

Abstract

Similar content being viewed by others

How Reliable Is Sentiment Analysis? A Multi-domain Empirical Investigation

A Comprehensive Study of Sentiment Analysis in Big Data Applications

Many Facets of Sentiment Analysis

Explore related subjects

1 Introduction

2 Sentiment analysis

2.1 Definition

2.2 Process flow

2.2.1 Data acquisition

2.2.2 Text pre-processing

2.2.3 Feature extraction/selection

2.2.4 Sentiment analysis

3 Sentiment analysis in management research

3.1 Understanding market structure and customer perceptions

3.2 Analysing financial sentiments

3.3 Examining accounting practices

4 Future directions

4.1 Understanding customers

4.2 Developing competitive insight

4.3 Going beyond sentiment: role of emotions

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation