Abstract
The rapidly growing online activities in the Web motivate us to analyze the reactions of different emotional catalysts on various social networking substrates. Thus, in the present chapter, different concepts, motivations, approaches and applications of emotion analysis are discussed in order to achieve the main challenging tasks such as feature representation schema, emotion classification, holder and topic detection and identifying their co-references, etc. as these are the main salient points to cover while analyzing emotions in social media. Additionally, a prototype is also described and assessed to analyze emotions, its collective actions based on users and topics, its components and their association from different available data sets of English and Bengali as case studies. Experiments and final outcomes highlight the promise of the approach and some open research problems.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Emotional Expression
- Natural Language Processing
- Conditional Random Field
- Discourse Marker
- Emotion Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
In recent times, research activities in the areas of opinion, sentiment and/or emotion in natural language texts and other media are gaining ground under the umbrella of Subjectivity Analysis and Affective Computing. Subjectivity Analysis aims to identify whether a sentence expresses an opinion or not and if so, whether the opinion is positive or negative (Liu 2010). The emotions are the subjective feelings and thoughts, and the strengths of opinions are closely related to the intensities of certain emotions, e.g., joy and anger (Liu 2010). Though the concepts of emotions and opinions are not equivalent, they have a large intersection. On the other hand, emotion analysis has also been studied in many fields of Affective Computing, a key area of research in computer science. The majority of the emotion analysis fields are hermeneutics, psychology, philosophy, sociology, biology and political science.
The 24/7 news sites and blogs facilitate the expression and shaping of opinion or emotion locally and globally (Ahmad 2011). Emails, Weblogs, chat rooms, online forums and even Twitter are being considered as the effective social media for discussing recent topics. Blogs are the most important, communicative and informative repository of text-based emotional contents in Web 2.0 (Yang et al. 2007). Facebook, Linkdin and even Google + also contain a blog-like structure. Many blogs act as online diaries of the bloggers for reporting the blogger’s daily activities and surroundings. Sometimes, the blog posts are annotated by other bloggers. Therefore, blogs are being considered as one of the personal journals where people express their personal opinions on different aspects like products, travelled tourism places, politics and current happenings in society. The blog posts contain instant views, updated views or influenced views regarding single or multiple topics.
Nowadays, people post the product reviews at merchant sites and express their views on discussion forums, blogs and social network sites. But, with huge amounts of such social text being generated, it is important to find methods that can annotate and organize documents in meaningful ways. Thus, topic identification is also used for document ranking in information retrieval systems. In addition to the content of the document itself, other relevant information about a document such as related topics can often enable a faster and more effective search or classification.
Topic identification from social sites is also essential in connection with categorizing search applications (Stein and Eissen 2004). Categorizing search means to apply text categorization facilities to retrieval tasks where a large number of documents are returned. Categorizing search has attracted much interest recently; its potential has been realized by users and search engine developers in the same way. The categorization was also started based on the opinionated or sentiment or emotional contents of the documents in the World Wide Web since the last few years.Footnote 1 , Footnote 2
From the perspective of natural language processing (NLP) applications, emotion analysis has been considered as a sub-discipline at the crossroads of information retrieval (Pang and Lee 2008) (Sood and Vasserman 2009) and computational linguistics (Wiebe et al. 2005). Emotions, of course, are not linguistic things. However, the most convenient access that we have to them is through language (Strapparava and Valitutti 2004). Natural language texts not only contain informative contents, but also more or less attitudinal private information including emotions. It has been observed that the classification of reviews (Turney 2002) or newspaper articles (Lin et al. 2007) is also increasingly incorporating emotion analysis within their scope. But the identification of emotions from texts is not an easy task because of its restricted access in case of objective observation or verification (Quirk et al. 1985). Moreover, the same textual content can be presented with different emotional slants (Grefenstette et al. 2004).
It is said that sentiment or emotion is typically a localized phenomenon that is more appropriately computed at the paragraph, sentence or entity level (Liu 2009). But emotion analysis can be performed at several levels of granularity, word, phrase, sentence, clause, paragraph or document (Das and Bandyopadhyay 2009, 2010a). Yu (2009) also proposed several granularity levels such as terms, expression, statement, passage and document.
It is sometimes observed that the topics discussed at the sentence level are not similar to the topic of the overall document. Thus, it is important to find methods that can annotate the documents in meaningful ways so that the topic identification can also be used for document ranking in information retrieval systems. One of the important tasks that proposed various insights and solutions related to the topic identification was described in Lin (1997). Paula Chesley et al. (2006) present experiments on subjectivity and polarity classifications of topic- and genre-independent blog posts. Emotion analysis also involves identifying the emotion holder in addition to the emotion topic. An emotion agent or holder is the person or organization expressing the emotion (Wiebe et al. 2005). In the case of product reviews and blogs, holders are usually the authors of the posts. Extraction of the emotion holder is important to discriminate between emotions that are viewed from different perspectives (Seki 2007). By grouping emotion holders of different stances on diverse social and political issues, we can have a better understanding of the relationships among countries or among organizations (Kim and Hovy 2006).
Sometimes, it is found that the sentences of a document may or may not contain any direct clue for the emotional expression, especially in social sites (“Dream of music is in their eyes and hearts”.) or there are certain example sentences that contain emotional expression without a potential holder (“His acting was really attractive”), or topic (“I fall into cry”). Thus, with such examples and problems in mind, it can be hypothesized that the notion of user-topic co-reference will facilitate both the manual and automatic identification of emotions. Therefore, we consider the emotional expression, holder and topic as the three essential components of emotion (Das and Bandyopadhyay 2010d).
The tracking of emotions over events or about politics as expressed in online forums or news to customer relationship management and the determination of the emotion holder and topic is an important task. Thus, we can also track users’ emotions expressed in online forums or blogs or Twitter messages or social networking sites for different applications such as sentiment review, customer management, stock exchange prediction, etc. The identification of the temporal trends of emotions and topics has drawn the recent attention of NLP communities (Fukuhara et al. 2007; Das and Bandyopadhyay 2011a) because among all concerns, emotions of people are important because people’s emotion has great influence on our society (Das and Bandyopadhyay 2011b).
Apart from the commercial perspectives, the other potential contributions of the present chapter in relation to the book have been described as follows:
-
We aim to identify the emotional changes among users during their communication in the context of social networking. We mainly focus on the blog users and their topics of discussion for identifying emotional changes with respect to time. We have considered the social interactions of the bloggers as collective actions, and their emotional changes based on such actions have been measured from the perspectives of topic and time.
-
We have incorporated the knowledge of two types of stimuli, self and influential affects in identifying the behavioural contingencies from the social interactions of the blog users.
-
We consider collective actions as events, and therefore the emotional change with respect to events has been represented using the graphical notion of sentiment event tracking.
Thus, based on the above issues, we have organized this chapter as follows: Section 2 describes the concepts and motivations related to emotion analysis. Section 3 discusses a prototype for emotion analysis which identifies several components of emotion and the need of emotional co-reference for the components. Section 4 describes the application of the system in terms of time-based emotion tracking. Finally, Section 5 highlights the main conclusions of this chapter.
2 Concepts and Motivations
Several frameworks exist from various fields of academic study, such as cognitive science, linguistics and psychology, that can inform and augment analyses of sentiment, opinion and emotion (Read and Carroll 2010). Some of the generalized definitions of emotions are as follows:
Definition1:
Emotion is a complex psycho-physiological experience of an individual’s state of mind as interacting with biochemical (internal) and environmental (external) influences. In humans, emotion fundamentally involves physiological arousal, expressive behaviours and conscious experience (Myers 2004).
Definition2:
In psychology and common use, emotion is an aspect of a person’s mental state of being, normally based on or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al. 2008).
Emotions, of course, are not linguistic objects/entities, and it is also said that the most non-phenomenological access to emotion is language (Ortony and Turner 1990). But the identification of emotions from texts is not an easy task due to the following challenges:
-
Basic and complex categories of emotions (James 1884; McDougall 1926; Watson 1930; Arnold 1960; Izard 1971; Plutchik 1980; Ekman 1992; Parrott 2001 and several others)
Though there are several other theories of emotion, the debate is concerned with some basic and complex categories, where complex emotions could arise from cultural conditioning or association combined with basic emotions. However, there is still not a set of agreed basic emotions of people among researchers. Ekman (1992), for instance, derived a list of six basic emotions from subjects’ facial expressions which several researchers employed as classes in an affect recognition task. Thus, we have presently confined ourselves into these six classes of emotions.
-
Restricted access in case of objective observation or verification (Quirk et al. 1985)
-
Presentation of the same textual content with different emotional slants (Grefenstette et al. 2004)
Although there are only 6 forms of emotions, there are a large number of language expressions that can be used to express them.
-
Level of granularity for processing of emotional evaluative expressions (word/phrase/clause/sentence/paragraph/document) (Wiebe et al. 2005; Das and Bandyopadhyay 2009, 2010a; Liu 2009; Yu 2009)
-
Valence (positive, negative, neutral) or Ekman’s six emotion types— “anger”, “disgust”, “fear”, “joy”, “sadness”, “surprise”
-
Intensity (low, medium, high, etc.)
-
Aspects and attributes (Holder/Target/Topic) (Kim and Hovy 2006; Das and Bandyopadhyay 2010b, c)
Thus, it is very hard to define emotion and to identify its regulating or controlling factors. These aspects raise the need of syntactic, semantic and pragmatic analysis of a text (Polanyi and Zaenen 2006).
But the majority of subjective analysis methods that are related to emotion are based on textual keywords spotting that uses specific lexical resources. SentiWordNet (Baccianella et al. 2010) is a lexical resource that assigns positive, negative and objective scores to each WordNet synset (Miller 1995). Subjectivity wordlist (Banea et al. 2008) assigns words with strong or weak subjectivity and prior polarities of types positive, negative and neutral. Some well-known sentiment lexicons have been developed, such as General Inquirer System (Stone 1966), Subjective adjective list (Baroni and Vegnaduzzo 2004), English SentiWordNet (Esuli and Sebastiani 2006), Taboada’s adjective list (Voll and Taboada 2007), etc. But all the mentioned resources are in English and have been used in coarse-grained sentiment analysis (e.g., positive, negative or neutral). The characterization of the words and phrases according to their emotive tone was first carried out in Turney (2002).
In recent trends, the application of mechanical turk for generating emotion lexicons (Mohammad and Turney 2010) shows promising results. The opinion or emotion annotation of a language has been performed for several natural language domains such as news (Strapparava and Mihalcea 2007), blogs (Mishne and de Rijke 2006a, b) or others. Opinion mining at word, sentence and document levels along with opinion summarization on news and Weblog documents is discussed in Ku et al. (2006). In order to estimate the affects in text, the model proposed in Neviarouskaya et al. (2007) processes symbolic cues and employs NLP techniques for word, phrase and sentence level analysis. Several machine-learning techniques were employed on blog data to identify the mood of the authors during writing (Mishne and de Rijke 2006a, b). The text-based emotion prediction using such supervised machine-learning approaches based on the SNoW learning architecture is discussed in Alm et al. (2005).
Prior work in identification of opinion or emotion holders has sometimes identified only a single opinion per sentence (Bethard et al. 2004) and sometimes several (Choi et al. 2005). An identification of opinion holders for question answering with supporting annotation tasks was attempted in Wiebe et al. (2005). The techniques that were employed to detect the holders are based on labelling the arguments of the verbs with their semantic roles (Swier and Stevenson 2004) or syntactic models (Das and Bandyopadhyay 2010b), emotion knowledge base (Hu et al. 2006), machine–learning-based classification (Evans 2007), etc. The anaphor resolution–based opinion holder identification method also exploits the lexical and syntactic information (Kim et al. 2007).
On the other hand, emotion topic can be defined as the real-world object, event or abstract entity that is the primary subject of the emotion or opinion as intended by its holder (Stoyanov and Cardie 2008a). The topic depends on the context in which its associated emotional expression occurs (Stoyanov and Cardie 2008b). In the related area of opinion topic extraction, different researchers have contributed their efforts. Some of the works are mentioned in Kobayashi et al. (2004), Popescu and Etzioni (2005). But all these works are either based on lexicon lookup or are applied on the domain of product reviews. The opinion topics are not necessarily spatially coherent in social texts as there may be two opinions in the same sentence on different topics as well as opinions that are on the same topic separated by opinions that do not share that topic (Stoyanov and Cardie 2008a). Not only topics, but the relation between sentiment and event can also be identified using the knowledge of lexical equivalence and co-reference approaches (Kolya et al. 2011).
The majority of the existing works in this field have been conducted for English. As far as our discussions on related work mentioned earlier are concerned, to the best of our knowledge, at present, only a few corpora or systems are available for analyzing emotions in languages other than English. In the present chapter, we have considered the Bengali language for our case studies. Bengali is the fifth most popular language in the world, second in India and the national language in Bangladesh. Hence, the present work is also a foray into emotion analysis for an Indian language.
It can be concluded that the perspectives of sociology, psychology and commerce along with the close association among people, topics and sentiments motivate us to investigate the insides in emotional changes of people over topic and time.
3 Approaches to Identifying Emotion Components
There are two main approaches to tackle the issues of emotion analysis: A “rule-based” approach defines markers and linguistic syntactic rules so as to determine the emotions of a text (Mihalcea et al. 2007). It also addresses the semantics associated with emotions by using conceptual resources such as lexicons. On the other hand, a “corpus-based” approach uses an annotated corpus to build a system which identifies emotions. In the corpus-driven approach, the language choice only impacts the corpus selection. However, most successful approaches depend on syntactic rules or text semantics of either keywords or phrases.
3.1 Emotional Expression
It is said that sentiment and/or emotion is typically a localized phenomenon that is more appropriately computed at the paragraph, sentence or entity level (Liu 2009). But emotion analysis can be performed at several levels of granularity, from word, phrase and sentence to document levels. Thus, we can propose the prototype systems of identifying evaluative emotional expressions at different levels of granularities such as word (W), phrase (P), sentence (S) and document (D) (Das and Bandyopadhyay 2010a, 2010f; Das and Bandyopadhyay 2011c).
The baseline system for word-level emotion tagging to measure the performance with respect to each emotion class has been developed in Das and Bandyopadhyay (2010a). Each of the words of the corpus is passed through these six separate modules to tag with the appropriate class label. In addition, the conditional random field (CRF) (Lafferty et al. 2001)– and support vector machine (SVM) (Joachims 1998)–based machine-learning classifiers are also employed for word-level emotion tagging. Different singleton features (e.g., part of speech (POS) of the words, question words, reduplication, colloquial/foreign words, special punctuation symbols, negations, emoticons) and context features (e.g., unigram, bigram) at the word and POS tag level along with their different combinations are used for training and testing. Some of the common features are as follows:
-
Bag of Words: Considers all terms in the corpus and builds a vector per document, where each dimension expresses the presence or absence of the term in the document. This approach can be refined adding the term importance in the corpus.
-
Emotion/Affect Words (EW): The presence of a word in the WordNet Affect lists (Strapparava and Valitutti 2004) identifies the emotion/affect words.
-
Parts of Speech (POS): We are interested in the verb, noun, adjective and adverb words as these are emotion-informative constituents.
-
Bag of features: Some features are chosen, e.g., some units with high frequency in the corpus, such as unigrams, n-grams, POS, adjectives, etc. In order to represent it, usually each feature is put in a dimension of a vector representing the text fragment. In each dimension, usually only the feature presence or frequency is recorded.
-
Intensifiers (INTF): Dependency relations such as amod() [adjectival modifier] and advmod() [adverbial modifier] containing JJ (adjective) and RB (adverb) tagged elements are considered as intensifiers.
-
Direct and Transitive Dependency relations (DD and TD): The direct dependency (DD) is identified based on the simultaneous presence of the emotion word and the other word in the same dependency relation, whereas the transitive dependencies (TD) are verified if they are connected via one or more intermediate dependency relations.
-
Negations (NEG): Dependency relations such as neg_( ) [negation modifier] or the negative words from a manually prepared list (no, not, nor, neither, etc.), e.g., “there’s no way I’m turning it down”, are adopted for the Negations feature.
-
Conjuncts (CONJ): The dependency relation conj_( ) [conjunct] identifies the Conjuncts features (and, or, but), e.g., “But I asked him about WHY he doesn’t cook for himself”.
-
Punctuation Symbols (Sym): Symbols such as comma (,), (!), (?) are often used in single or multiple numbers to emphasize emotional expressions and considered as crucial clues for identifying emotional presence (“I can’t believe she is FINALLY here!!!”).
-
Discourse Markers (DM): The present task aims to identify only the explicit discourse markers that are tagged by conjunctive_() or mark_() type dependency relations of the parsed constituents (e.g., as, because, while, whereas).
-
Capitalized Phrases (CP): A capitalized word or a long capitalized phrase segment, e.g., “I forgot how demeaning BME classes are” or “Terrorists MAKE ME SICK, they ought to all be horrifically detained”, is considered as the Capitalized Phrases feature.
-
Emoticons (emot_icon): The emoticons (, , ) and their consecutive occurrence generally contribute as much as real sentiment to the words that precede or follow them.
Lexical analysis plays a crucial role to identify emotions from a text. For example, the words like love, hate, good, bad, happy and sad directly indicate emotion. But the assumption is true only within a limited context and restricted granularity. Let us consider the following example:
Example 1
“Though Mr. Jonathon Read could be generally happy about his car, his wife might be dissatisfied by the engine noise”.
If the sentence would be written by exchanging only the positions of the direct sentiment words (happy and dissatisfied), the whole sentence might lead to a mess:
“Though Mr. Jonathon Read could be generally dissatisfied about his car, his wife might be happy by the engine noise”.
Thus, the prime factors for disambiguation coming to our mind are the components that are associated with the emotional expressions. In the example, Mr . Jonathon Read and his car are associated with the emotional expression happy, whereas his wife and engine noise are associated with the emotional expression dissatisfied. It is clear that Mr . Jonathon Read and his wife represent the emotion holders, whereas his car and engine noise represent the emotion topics, respectively.
3.2 Emotion Holder
The baseline model (BM) for identifying emotion holders in English can be designed based on the subject information of the parsed emotional sentences. We can employ the Stanford dependency parserFootnote 3 to accomplish the task. Similarly, for the morphologically rich languages, e.g., Bengali, the sentences are passed through an open source Bengali shallow parser that produces different morphological information (e.g., root, case, vibhakti, tam, suffixes, etc.). The lexical pattern–based phrase-level similarity clues containing different POS combinations, named entities (NEs) and noun phrases are considered for identifying the emotion holders.
Emotion holders can also be identified based on the syntactic argument structure of the emotional sentences. In English, the head of each chunk in the dependency-parsed output helps in constructing the syntactic argument structure with respect to the key emotional verb. Two separate techniques can be used for extracting the argument structure. One is from the parsed result directly, and another is from the corpus that has been POS-tagged and chunked separately. Similarly, the verb-based argument structures are acquired from the chunk- or phrase-level lexical patterns (Kim and Hovy 2006; Choi et al. 2005; Das and Bandyopadhyay 2010b). The pivotal hypothesis considered in the syntactic model (SynM) is based on the hypothesis followed in Banerjee et al. (2010). The hypothesis is that if the acquired syntactic argument structure of a sentence matches with any of the retrieved frame syntaxes of VerbNet,Footnote 4 the holder roles (e.g., Experiencer, Agent, Actor, Beneficiary, etc.) associated with the VerbNet frame syntaxes are then assigned in the appropriate slots in the syntactic arguments of the sentence. For other languages, each acquired syntactic argument structure is mapped to all the possible frame syntaxes present for the corresponding verb in the English VerbNet.
3.3 Emotion Topic
Like emotion holders, the baseline model for identifying emotion topics is developed based on the object-related dependency-parsed relations. The phrase segments containing topic-related thematic roles (e.g., Topic, Theme, Event, etc.) are extracted from the verb-based syntactic argument structures of the sentences. On the other hand, a supervised model (SvdM) is adopted to identify multiple emotion topics along with their individual topic and target spans from each sentence. CRF, SVM and Fuzzy Classifier (FC) are employed by considering various features (e.g., the annotated emotional expressions along with direct and transitive dependencies, causal verbs, discourse markers, emotion holders, named entities and four types of similarity measures—Structural Similarity, Sentiment Similarity, Syntactic Similarity and Semantic Similarity) and their combinations (Kim and Hovy 2006; Das and Bandyopadhyay 2010c). A prototype system for identifying sentiment has been shown in Fig. 1 taken from the article (Kim and Hovy 2006). The incorporation of a special feature, Structural Similarity, that is based on the Rhetorical Structure Theory (Mann and Thompson 1988) improves the topic identification system.
3.4 Need of Emotion Co-reference
The importance of the emotion-associated components such as holder and topic can easily be verified by mingling their positions and keeping the positions of their corresponding emotional expressions intact. In Example 1, the following variations can be seen:
-
1.
“Though his wife could be generally happy about his car, Mr . Jonathon Read might be dissatisfied by the engine noise”.
-
2.
“Though Mr . Jonathon Read could be generally happy about the engine noise, his wife might be dissatisfied by his car”.
-
3.
“Though his wife could be generally happy about the engine noise, Mr . Jonathon Read might be dissatisfied by his car”.
Thus, it is clear that the proper understanding of the emotion components and their associations is very important if we need to mine emotion properly from texts. For this very reason, document-level emotion classification sometimes fails to detect emotion as it does not account for the individual emotion components. Sometimes, a single topic is co-referred by several users as well as multiple topics are co-referred by a single user. Ekman’s 6 different emotions plotted for 8 different topics and referred by each of the 22 bloggers are shown in Fig. 2 signifying that the user-topic co-reference system performs to generate the emotional views of the bloggers and its dependence on the associated topics. With the above examples and problems in mind, it can be hypothesized that the co-reference among the emotion components will facilitate both the manual and automatic identification of emotions.
3.5 Results
Various experiments regarding symbolic feature, language, and domain-dependent features are carried out for evaluating the word-level emotion classification system. The lexical feature (e.g., POS, words of SentiWordNet and WordNet Affect) outperforms other features significantly. A different combination of context features also shows significant improvement in performance. Though we evaluated all our systems on large amounts of data and the corresponding results are mentioned in different research articles (Das and Bandyopadhyay 2009, 2010a, b, c, d, e, f, 2011b; Das and Bandyopadhyay 2011c), we have reported here only the best performed results on small test data as follows:
The word-level tagging system has demonstrated the average F-Scores of 83.65 % on 1,500 word tokens on English news corpus. The average F-Score of 65 % was achieved for 200 test sentences of SemEval 2007 corpus in sentential emotion tagging. A supervised system for English blogs (Aman and Szpakowicz 2007) outperforms the baseline system and achieves the average F-Scores of 82.72 %, 76.74 % and 89.21 % for emotional expressions, sentential emotions and intensities respectively on 565 gold standard test sentences.
It has been observed that the baseline model for English suffers in identifying emotion holders from the passive sentences. The dependency parser–based method achieves a better F-Score (66.98 %) than the other method (F-Score of 62.39 %) on a collection of 4,112 emotional sentences as the second method fails to disambiguate mostly the arguments from adjuncts. The maximum average F-Scores of the baseline and hybrid systems for emotion topic identification are 56.75 % and 58.88 % respectively on 500 sentences (Table 1).
The supervised multi-engine voting system achieves the F-Scores of 70.51 % and 90.44 % for topic and target span identification respectively from the blog sentences. But the syntactic system suffers in resolving some errors (e.g., appositive cases, co-reference with emotional expression, multiple holders and topics, overlapping topic spans, anaphoric presence of the holders). Thus, some simple rule-based error reduction techniques based on rhetorical structure and emotional expressions are employed in the syntactic system.
It was observed that the error occurs mostly for metaphoric and colloquial usages, unstructured sentences (e.g., “Really starting to lose it”) and sentences containing typographic errors (e.g., “she’s feeling very goooood about herself”). But it is also true that in micro-blogging such as Twitter, the number of misspelled words is even higher due to the 140-character space constraint. In order to test the robustness of the proposed approaches, we would like to incorporate the approaches in order to annotate a Twitter-extracted corpus.
4 Application in Emotion Tracking
4.1 Tracking Bloggers’ Emotions
The blog documents are generally stored in the format as shown in Fig. 3. Each of the blog documents is assigned with a unique identifier (docid#) followed by a section devoted for topics and several sections devoted for different users’ comments. Each comment section consists of several nested and overlapped subsections that also contain the bloggers’ comments. Each of the comment sections of an individual blogger is uniquely identified by the notion of section identification number (secid#). Each section contains the information regarding the identification number of the blog user (uid#) and the associated timestamp (tid#).
If we consider the individual comment section as a separate paragraph that contains several emotional sentences, the emotions present in such individual comment sections represent the emotional state of the blogger at that timestamp. The Referential Informative Chain (RIC) for each of the bloggers is constructed by acquiring the default annotated information like timestamp (tid#), unique identifier (uid#) and emotional comments that are acquired from the nested tree-like structure of the comment sections. The individual RIC is developed for each single blogger with respect to each comment section. Each node of an RIC denotes the emotional state of the blogger at a particular time instance, and the sequence of adding information into the nodes is based on the ascending order of associated timestamps. For example, in Fig. 2, the two nodes, namely, n1 and n2, will be added in front of an RIC developed for the blog user with uid = 1. The associated timestamps (t1, t4) and emotions will also be added into the nodes accordingly. As t4 > t1, the inclusion of node n1 is considered before the inclusion of n2 into the corresponding RIC of uid 1.
The identification of Ekman’s six basic emotions from the bloggers’ comments is carried out at sentence- and paragraph-level granularities. An affect-scoring technique is employed to identify the emotions of a state or node in each of the RICs of the bloggers. Two types of affect scores, Self Affect Score (SAS) and Influential Affect Score (IAS), are used to produce the Emotional Score (ES) of a blogger at each node with a particular timestamp (Das and Bandyopadhyay 2011c).
The changes of a blogger’s emotions are tracked based on the emotions that are assigned to the nodes of the blogger’s RIC. The importance of self and influential affects in tracking results is evaluated using two evaluation techniques, extrinsic and intrinsic. The system achieves precision (P), recall (R) and F-score of 61.05 %, 69.81 % and 65.13 % respectively in case of extrinsic evaluation and the satisfactory average scores of 0.67 and 0.71 for the nominal alpha (Nα) and interval alpha (Iα) in case of intrinsic evaluation (Das and Bandyopadhyay 2011c).
4.2 Sentiment Event Tracking
The temporal sentiment identification from social events has been carried out in Fukuhara et al. (2007). In their task, the authors analyzed the temporal trends of sentiments and topics from a text archive that has timestamps in Weblog and news articles. The system produces two kinds of graphs, topic graph that shows temporal change of topics associated with a sentiment and sentiment graph that shows temporal change of sentiments associated with a topic. Mishne and de Rijke (2006a, b) also proposed a system, MoodViews, to analyze the temporal change by using 132 sentiments used in LiveJournal. With respect to information visualization, Havre et al. proposed a system called ThemeRiver (Havre et al. 2002) that visualizes thematic flows along with a timeline. The temporal relations between events associated with similar or different types of sentiments and visualization of sentiment flows on events based on temporal expressions were carried out in Das and Bandyopadhyay (2011a) by incorporating the knowledge of temporal relations (e.g., AFTER, BEFORE, OVERLAP). The authors proposed a task to identify the temporal relations between the events that occur in two consecutive sentences and to classify the event pairs into their respective temporal classes. Incorporating sentiment property into the set of other standard features, the proposed system outperforms all the participated state-of-the-art systems of TempEval 2007. The positive or negative coarse-grained sentiments as well as Ekman’s six basic universal emotions or fine-grained sentiments are assigned to the events. Based on the temporal relations, the events from each of the documents are represented using a directed graph that shows the shallow path for identifying the sentiment changes over events. An individual separate graph is generated for each of the documents of TempEval-2007 event corpus. The sentiment of each sentence is assigned to its containing event, and each event is represented using a graphical node. The event nodes that are of similar sentiments are connected to their corresponding sentiment hubs based on their annotated sentential sentiment tags.
The tracking of sentiments includes the sentiment twist and sentiment transition. The sentiment twist is the change of sentiment between two consecutive events, whereas sentiment transition is among more than two sentiment events (as shown in Fig. 4). The sentiment change or tracking of sentiments is identified from the AFTER, BEFORE and OVERLAP temporal relations. The ambiguities of the OVERLAP relation are identified by the notion of BEFORE-OR-OVERLAP and OVERLAP-OR-AFTER relations. The number of instances of the sentiment transitions is less than the number of instances of the sentiment twists in the TimeML corpus. Hence, the sentiment transition or tracking of sentiment is identified based on the sentiment twists of the intermediate event pairs in an event chain.
5 Conclusions
This chapter gave an introduction to emotion analysis, its foundations and representative state-of-the-art approaches. In addition, the main problems and issues are discussed. Due to many challenging research problems and a wide variety of practical applications, it has been a very active research area in recent years.
We mainly focused on feature-based emotion analysis, which exploits the full power of an abstract model being built. Results of an implemented medium-size prototype are reported in terms of usual metrics such as F-score, accuracy, etc.
Finally, it is important to highlight that all the emotion analysis tasks are very challenging from the perspective of social computing, and our understanding and expertise of the different arisen issues are still limited. The main reason is that it is a natural language processing task, and natural language processing has no easy problems. In addition, the most effective machine-learning algorithms (i.e., CRF, SVM) still produce no human-understandable results, such that, although they may achieve improved accuracy, we know little about how and why, apart from some basic knowledge gained in the manual feature selection process.
References
Ahmad K (2011) Affective computing and sentiment analysis: emotion, metaphor and terminology. Springer text, speech and language technology series, vol 45. Springer, Heidelberg
Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of HLT-EMNLP. Association of Computational Linguistics, Stroudsburg, PA, pp 579–586
Aman S, Szpakowicz S (2007) Identifying expressions of emotion in text. In: Matoušek V, Mautner P (eds) Text, speech and dialogue. Lecture notes in computer science 4629. Springer, Heidelberg, pp 196–205
Arnold MB (1960) Emotion and personality. Columbia University Press, New York
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the 7th conference on language resources and evaluation, Valleta, Malta, pp 2200–2204
Banea C, Mihalcea R, Wiebe J (2008) A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In: The sixth international conference on language re-sources and evaluation (LREC 2008), Marrakech, Morocco
Banerjee S, Das D, Bandyopadhyay S (2010) Bengali verb subcategorization frame acquisition - a baseline model. In: Proceedings of the 7th workshop of Asian Language Resources (ALR-7), Joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP-2009), Suntec, Singapore, pp 76–83
Baroni M, Vegnaduzzo S (2004) Identifying subjective adjectives through web-based mutual information. In: Proceedings of the German Conference on NLP, Vienna
Bethard S, Yu H, Thornton A, Hatzivassiloglou V, Jurafsky D (2004) Automatic extraction of opinion propositions and their holders. In: AAAI Spring symposium on exploring attitude and affect in text: theories and applications. AAAI, Palo Alto, CA
Chesley P, Bruce V, Li X, Srihari RK (2006) Using verbs and adjectives to automatically classify blog sentiment. In: Proceedings of AAAI Spring symposium on computational approaches to analyzing weblogs. AAAI, Palo Alto, CA, pp 25–28.
Choi Y, Cardie C, Riloff E, Patwardhan S (2005) Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of HLT/EMNLP, Vancouver, BC, Canada
Das D, Bandyopadhyay S (2009) Word to sentence level emotion tagging for Bengali blogs. In: ACL-IJCNLP 2009, Singapore, pp 149–152
Das D, Bandyopadhyay S (2010a) Sentence level emotion tagging on blog and news corpora. J Intell Syst 19(2):125–134
Das D, Bandyopadhyay S (2010b) Emotion holder for emotional verbs – the role of subject and syntax. In: Gelbukh A (ed) CICLing- 2010. Lecture notes in computer science 6008. Springer, Heidelberg, pp 385–393
Das D, Bandyopadhyay S (2010c) Identifying emotion topic-an unsupervised hybrid approach with rhetorical structure and heuristic classifier. In: Proceedings of the 6th IEEE NLP-KE 2010, Beijing, 21–23 Aug 2010. ISBN 978-1-4244-6896-6
Das D, Bandyopadhyay S (2010d) Discerning emotions of bloggers based on topics – a supervised coreference approach in Bengali. In: Proceedings of the 22nd conference on computational linguistics and speech processing (ROCLING 2010), Taiwan, pp 350–360
Das D, Bandyopadhyay S (2010e) Developing Bengali WordNet affect for analyzing emotion. In: Proceedings of the 23rd international conference on the computer processing of oriental languages (ICCPOL-2010), Redwood City, CA, pp 35–40
Das D, Bandyopadhyay S (2010f) Identifying emotional expressions, intensities and sentential emotion tags using a supervised framework. In: 24th PACLIC, Tohoku University, Sendai
Das D, Bandyopadhyay S (2011a) Emotions on Bengali blog texts: role of holder and topic. First workshop on social network analysis in applications (SNAA 2011). ASONAM 2011:587–592. doi:10.1109/ASONAM.2011.106
Das D, Bandyopadhyay S (2011b) Tracking emotions of bloggers – a case study for Bengali. POLIBITS 45:53–59
Das D, Bandyopadhyay S (2011c) Document level emotion tagging–machine learning and resource based approach. J Comput Sist (CyS) 15(2):221–234
Das D, Anup Kumar K, Asif E, Bandyopadhyay S (2011) Temporal analysis of sentiment events-a visual realization and tracking. In: Gelbukh A (ed) Proceedings of 12th international conference on intelligent text processing and computational linguistics (CICLing-2011). Lecture notes in computer science 6608, Tokyo. Springer, Heidelberg, pp 417–428
Ekman P (1992) Facial expression and emotion. Am Psychol 48(4):384–392
Esuli A, Sebastiani F (2006) SENTIWORDNET: a publicly available lexical resource for opinion mining. In: LREC’06, Genoa
Evans DK (2007) A low-resources approach to opinion analysis: machine learning and simple approaches. NTCIR, Chiyoda-ku
Fukuhara T, Nakagawa H, Nishida T (2007) Understanding sentiment of people from news articles: temporal sentiment analysis of social events. In: ICWSM’2007, Boulder, CO
Grefenstette G, Qu Y, Shanahan JG, Evans DA (2004) Coupling niche browsers and affect analysis for an opinion mining application. In: RIAO-04, Avignon, pp 186–194
Havre S, Hetzler E, Whitney P, Nowell L (2002) ThemeRiver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 8(1):9–20
Hu J, Guan C, Wang M, Lin F (2006) Model of emotional holder. In: Shi Z-Z, Sadananda R (eds) PRIMA 2006. Lecture notes in computer science (LNAI) 4088. Springer, Heidelberg, pp 534–539
Izard CE (1971) The face of emotion. Appleton-Century-Crofts, New York
James W (1884) What is an emotion? Mind 9:188–205
Joachims T (1998) Text categorization with support machines: learning with many relevant features. In: European conference on machine learning, Chemnitz, 21–24 Apr 1998, pp 137–142
Kim SM, Hovy E (2006). Extracting opinions, opinion holders, and topics expressed in online news media text. In: Workshop on sentiment and subjectivity in ACL/Coling, Sydney
Kim Y, Jung Y, Myaeng S-H (2007) Identifying opinion holders in opinion text from online newspapers. In: 2007 I.E. international conference on granular computing, San Jose, CA, pp 699–702
Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2004) Collecting evaluative expressions for opinion extraction, IJCNLP 2004. Springer, Berlin
Kolya AK, Das D, Ekbal A, Bandyopadhyay S (2011) Identifying event–sentiment association using lexical equivalence and co-reference approaches. In: Proceedings of the workshop on relational models of semantics (RELMS 2011), ACL-HLT 19–27, Portland, OR
Ku LW, Liang YT, Chen HH (2006) Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI-CAAW2006, Stanford University, CA, 27–29 Mar 2006, pp 100–107
Lafferty J, McCallum AK, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th international conference on machine learning, Corvallis, OR
Lin C-Y (1997) Robust automated topic identification. Faculty of the Graduate School, University of Southern California. ACL, pp 308–310
Lin KH-Y, Yang C, Chen H-H (2007) What emotions news articles trigger in their readers? SIGIR 733–734
Liu B (2009). The challenge is still the accuracy of sentiment prediction and solving the associated problems. In: 5th Annual text analytics summit, Boston, MA.
Liu B (2010) Sentiment analysis and subjectivity. In: Indurkhya N, Damerau FJ (eds) Handbook of natural language processing, 2nd edn. CRC, Boca Raton, FL
Mann WC, Thompson S (1988) Rhetorical structure theory: toward a functional theory of text organization. TEXT 8:243–281
McDougall W (1926) An introduction to social psychology. Luce, Boston
Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: Annual meeting of the Association of Computational Linguistics, Prague, pp 976–983
Miller AG (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Mishne G, de Rijke M (2006a) Capturing global mood levels using blog posts. In: AAAI-CAAW2006, Stanford University, CA, 27–29 Mar 2006, pp 145–152
Mishne G, de Rijke M (2006b) MoodViews: tools for blog mood analysis. In: AAAI 2006 Spring symposium on computational approaches to analyzing weblogs, Stanford, CA
Mohammad S, Turney PD (2010) Emotions evoked by common words and phrases: using mechanical Turk to create an emotion lexicon. In: Proceedings of the NAACL-HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, Los Angeles, CA, pp 26–34
Myers DG (2004) Theories of emotion. Psychology, 7th edn. Worth Publishers, New York, NY, p 500
Neviarouskaya A, Prendinger H, Ishizuka M (2007) Narrowing the social gap among people involved in global dialog: automatic emotion detection in blog posts. ICWSM, Boulder, CO
Ortony A, Turner TJ (1990) What’s basic about basic emotions? Psychol Rev 97:315–331
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Parrott W (2001) Emotions in social psychology. Psychology Press, Philadelphia
Plutchik R (1980) A general psychocvolutionary theory of emotion. In: Plutchik R, Kellerman H (eds) Emotion: theory, research, and experience, vol 1, Theories of emotion. Academic Press, New York, pp 3–31
Polanyi L, Zaenen A (2006) Contextual valence shifter. In: Shanahan JG, Yan Q, Wiebe J (eds) Computing attitude and affect in text: theory and applications, Chap 1. Springer, Heidelberg, pp 1–10
Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of HLT/EMNLP, Vancouver, BC, 6–8 Oct 2005
Quirk R, Greenbaum S, Leech G, Svartvik J (1985) A comprehensive grammar of the English language. Longman, London
Read J, Carroll J (2010) Annotating expressions of appraisal in English. Lang Resour Eval. doi:10.1007/s10579-010-9135-7
Seki Y (2007) Opinion holder extraction from author and authority viewpoints. In: Proceedings of the SIGIR’07, ACM 978-1-59593-597-7/07/0007
Sood S, Vasserman L (2009) ESSE: Exploring mood on the web. In: Proceedings of the 3rd international AAAI conference on weblogs and social media (ICWSM), San Jose, CA, 17–20 May 2009
Stein B, Eissen SMz (2004) Topic identification: framework and application. Paderborn University, Paderborn
Stone PJ (1966) The general inquirer. A computer approach to content analysis. MIT Press, Cambridge, MA
Stoyanov V, Cardie C (2008a) Annotating topics of opinions. In: Proceedings of LREC, Marrakech, 26 May–1 June 2008
Stoyanov V, Cardie C (2008b) Topic identification for fine-grained opinion analysis. Coling 2008:817–824
Strapparava C, Mihalcea R (2007) SemEval-2007 Task 14: affective text. In: 45th ACL, Prague, 23–30 June 2007
Strapparava C, Valitutti A (2004) Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, May 2004, pp 1083–1086
Swier RS, Stevenson S (2004) Unsupervised semantic role labelling. In: EMNLP, Barcelona
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th ACL, Philadelphia, pp 417–424
Voll K, Taboada M (2007) Not all words are created equal: extracting semantic orientation as a function of adjective relevance. In: Proceedings of the 20th Australian joint conference on artificial intelligence, Gold Coast, pp 337–346
Watson JB (1930) Behaviorism. University of Chicago Press, Chicago
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. LRE 39(2–3):165–210
Yang C, Lin KH-Y, Chen H-H (2007) Emotion classification using web blog corpora. In: Proceedings of the IEEE, WIC, ACM international conference on web intelligence, Silicon Valley, 2–5 Nov 2007, pp 275–278
Yu N (2009) Opinion detection for web content. Comput Linguistics 39
Zhang Y, Li Z, Ren F, Kuroiwa S (2008) A preliminary research of Chinese emotion classification model. IJCSNS 8(11):127–132
Acknowledgments
The work reported in this paper was supported by a grant from the India-Japan Cooperative Programme (DSTJST) 2009 Research project entitled “Sentiment Analysis where AI meets Psychology” funded by Department of Science and Technology (DST), Government of India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Wien
About this chapter
Cite this chapter
Das, D., Bandyopadhyay, S. (2014). Emotion Analysis on Social Media: Natural Language Processing Approaches and Applications. In: Agarwal, N., Lim, M., Wigand, R. (eds) Online Collective Action. Lecture Notes in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1340-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-1340-0_2
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-1339-4
Online ISBN: 978-3-7091-1340-0
eBook Packages: Computer ScienceComputer Science (R0)