Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In recent times, research activities in the areas of opinion, sentiment and/or emotion in natural language texts and other media are gaining ground under the umbrella of Subjectivity Analysis and Affective Computing. Subjectivity Analysis aims to identify whether a sentence expresses an opinion or not and if so, whether the opinion is positive or negative (Liu 2010). The emotions are the subjective feelings and thoughts, and the strengths of opinions are closely related to the intensities of certain emotions, e.g., joy and anger (Liu 2010). Though the concepts of emotions and opinions are not equivalent, they have a large intersection. On the other hand, emotion analysis has also been studied in many fields of Affective Computing, a key area of research in computer science. The majority of the emotion analysis fields are hermeneutics, psychology, philosophy, sociology, biology and political science.

The 24/7 news sites and blogs facilitate the expression and shaping of opinion or emotion locally and globally (Ahmad 2011). Emails, Weblogs, chat rooms, online forums and even Twitter are being considered as the effective social media for discussing recent topics. Blogs are the most important, communicative and informative repository of text-based emotional contents in Web 2.0 (Yang et al. 2007). Facebook, Linkdin and even Google + also contain a blog-like structure. Many blogs act as online diaries of the bloggers for reporting the blogger’s daily activities and surroundings. Sometimes, the blog posts are annotated by other bloggers. Therefore, blogs are being considered as one of the personal journals where people express their personal opinions on different aspects like products, travelled tourism places, politics and current happenings in society. The blog posts contain instant views, updated views or influenced views regarding single or multiple topics.

Nowadays, people post the product reviews at merchant sites and express their views on discussion forums, blogs and social network sites. But, with huge amounts of such social text being generated, it is important to find methods that can annotate and organize documents in meaningful ways. Thus, topic identification is also used for document ranking in information retrieval systems. In addition to the content of the document itself, other relevant information about a document such as related topics can often enable a faster and more effective search or classification.

Topic identification from social sites is also essential in connection with categorizing search applications (Stein and Eissen 2004). Categorizing search means to apply text categorization facilities to retrieval tasks where a large number of documents are returned. Categorizing search has attracted much interest recently; its potential has been realized by users and search engine developers in the same way. The categorization was also started based on the opinionated or sentiment or emotional contents of the documents in the World Wide Web since the last few years.Footnote 1 , Footnote 2

From the perspective of natural language processing (NLP) applications, emotion analysis has been considered as a sub-discipline at the crossroads of information retrieval (Pang and Lee 2008) (Sood and Vasserman 2009) and computational linguistics (Wiebe et al. 2005). Emotions, of course, are not linguistic things. However, the most convenient access that we have to them is through language (Strapparava and Valitutti 2004). Natural language texts not only contain informative contents, but also more or less attitudinal private information including emotions. It has been observed that the classification of reviews (Turney 2002) or newspaper articles (Lin et al. 2007) is also increasingly incorporating emotion analysis within their scope. But the identification of emotions from texts is not an easy task because of its restricted access in case of objective observation or verification (Quirk et al. 1985). Moreover, the same textual content can be presented with different emotional slants (Grefenstette et al. 2004).

It is said that sentiment or emotion is typically a localized phenomenon that is more appropriately computed at the paragraph, sentence or entity level (Liu 2009). But emotion analysis can be performed at several levels of granularity, word, phrase, sentence, clause, paragraph or document (Das and Bandyopadhyay 2009, 2010a). Yu (2009) also proposed several granularity levels such as terms, expression, statement, passage and document.

It is sometimes observed that the topics discussed at the sentence level are not similar to the topic of the overall document. Thus, it is important to find methods that can annotate the documents in meaningful ways so that the topic identification can also be used for document ranking in information retrieval systems. One of the important tasks that proposed various insights and solutions related to the topic identification was described in Lin (1997). Paula Chesley et al. (2006) present experiments on subjectivity and polarity classifications of topic- and genre-independent blog posts. Emotion analysis also involves identifying the emotion holder in addition to the emotion topic. An emotion agent or holder is the person or organization expressing the emotion (Wiebe et al. 2005). In the case of product reviews and blogs, holders are usually the authors of the posts. Extraction of the emotion holder is important to discriminate between emotions that are viewed from different perspectives (Seki 2007). By grouping emotion holders of different stances on diverse social and political issues, we can have a better understanding of the relationships among countries or among organizations (Kim and Hovy 2006).

Sometimes, it is found that the sentences of a document may or may not contain any direct clue for the emotional expression, especially in social sites (“Dream of music is in their eyes and hearts”.) or there are certain example sentences that contain emotional expression without a potential holder (“His acting was really attractive”), or topic (“I fall into cry”). Thus, with such examples and problems in mind, it can be hypothesized that the notion of user-topic co-reference will facilitate both the manual and automatic identification of emotions. Therefore, we consider the emotional expression, holder and topic as the three essential components of emotion (Das and Bandyopadhyay 2010d).

The tracking of emotions over events or about politics as expressed in online forums or news to customer relationship management and the determination of the emotion holder and topic is an important task. Thus, we can also track users’ emotions expressed in online forums or blogs or Twitter messages or social networking sites for different applications such as sentiment review, customer management, stock exchange prediction, etc. The identification of the temporal trends of emotions and topics has drawn the recent attention of NLP communities (Fukuhara et al. 2007; Das and Bandyopadhyay 2011a) because among all concerns, emotions of people are important because people’s emotion has great influence on our society (Das and Bandyopadhyay 2011b).

Apart from the commercial perspectives, the other potential contributions of the present chapter in relation to the book have been described as follows:

  • We aim to identify the emotional changes among users during their communication in the context of social networking. We mainly focus on the blog users and their topics of discussion for identifying emotional changes with respect to time. We have considered the social interactions of the bloggers as collective actions, and their emotional changes based on such actions have been measured from the perspectives of topic and time.

  • We have incorporated the knowledge of two types of stimuli, self and influential affects in identifying the behavioural contingencies from the social interactions of the blog users.

  • We consider collective actions as events, and therefore the emotional change with respect to events has been represented using the graphical notion of sentiment event tracking.

Thus, based on the above issues, we have organized this chapter as follows: Section 2 describes the concepts and motivations related to emotion analysis. Section 3 discusses a prototype for emotion analysis which identifies several components of emotion and the need of emotional co-reference for the components. Section 4 describes the application of the system in terms of time-based emotion tracking. Finally, Section 5 highlights the main conclusions of this chapter.

2 Concepts and Motivations

Several frameworks exist from various fields of academic study, such as cognitive science, linguistics and psychology, that can inform and augment analyses of sentiment, opinion and emotion (Read and Carroll 2010). Some of the generalized definitions of emotions are as follows:

Definition1:

Emotion is a complex psycho-physiological experience of an individual’s state of mind as interacting with biochemical (internal) and environmental (external) influences. In humans, emotion fundamentally involves physiological arousal, expressive behaviours and conscious experience (Myers 2004).

Definition2:

In psychology and common use, emotion is an aspect of a person’s mental state of being, normally based on or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al. 2008).

Emotions, of course, are not linguistic objects/entities, and it is also said that the most non-phenomenological access to emotion is language (Ortony and Turner 1990). But the identification of emotions from texts is not an easy task due to the following challenges:

  • Basic and complex categories of emotions (James 1884; McDougall 1926; Watson 1930; Arnold 1960; Izard 1971; Plutchik 1980; Ekman 1992; Parrott 2001 and several others)

Though there are several other theories of emotion, the debate is concerned with some basic and complex categories, where complex emotions could arise from cultural conditioning or association combined with basic emotions. However, there is still not a set of agreed basic emotions of people among researchers. Ekman (1992), for instance, derived a list of six basic emotions from subjects’ facial expressions which several researchers employed as classes in an affect recognition task. Thus, we have presently confined ourselves into these six classes of emotions.

  • Restricted access in case of objective observation or verification (Quirk et al. 1985)

  • Presentation of the same textual content with different emotional slants (Grefenstette et al. 2004)

Although there are only 6 forms of emotions, there are a large number of language expressions that can be used to express them.

  • Level of granularity for processing of emotional evaluative expressions (word/phrase/clause/sentence/paragraph/document) (Wiebe et al. 2005; Das and Bandyopadhyay 2009, 2010a; Liu 2009; Yu 2009)

  • Valence (positive, negative, neutral) or Ekman’s six emotion types— “anger”, “disgust”, “fear”, “joy”, “sadness”, “surprise

  • Intensity (low, medium, high, etc.)

  • Aspects and attributes (Holder/Target/Topic) (Kim and Hovy 2006; Das and Bandyopadhyay 2010b, c)

Thus, it is very hard to define emotion and to identify its regulating or controlling factors. These aspects raise the need of syntactic, semantic and pragmatic analysis of a text (Polanyi and Zaenen 2006).

But the majority of subjective analysis methods that are related to emotion are based on textual keywords spotting that uses specific lexical resources. SentiWordNet (Baccianella et al. 2010) is a lexical resource that assigns positive, negative and objective scores to each WordNet synset (Miller 1995). Subjectivity wordlist (Banea et al. 2008) assigns words with strong or weak subjectivity and prior polarities of types positive, negative and neutral. Some well-known sentiment lexicons have been developed, such as General Inquirer System (Stone 1966), Subjective adjective list (Baroni and Vegnaduzzo 2004), English SentiWordNet (Esuli and Sebastiani 2006), Taboada’s adjective list (Voll and Taboada 2007), etc. But all the mentioned resources are in English and have been used in coarse-grained sentiment analysis (e.g., positive, negative or neutral). The characterization of the words and phrases according to their emotive tone was first carried out in Turney (2002).

In recent trends, the application of mechanical turk for generating emotion lexicons (Mohammad and Turney 2010) shows promising results. The opinion or emotion annotation of a language has been performed for several natural language domains such as news (Strapparava and Mihalcea 2007), blogs (Mishne and de Rijke 2006a, b) or others. Opinion mining at word, sentence and document levels along with opinion summarization on news and Weblog documents is discussed in Ku et al. (2006). In order to estimate the affects in text, the model proposed in Neviarouskaya et al. (2007) processes symbolic cues and employs NLP techniques for word, phrase and sentence level analysis. Several machine-learning techniques were employed on blog data to identify the mood of the authors during writing (Mishne and de Rijke 2006a, b). The text-based emotion prediction using such supervised machine-learning approaches based on the SNoW learning architecture is discussed in Alm et al. (2005).

Prior work in identification of opinion or emotion holders has sometimes identified only a single opinion per sentence (Bethard et al. 2004) and sometimes several (Choi et al. 2005). An identification of opinion holders for question answering with supporting annotation tasks was attempted in Wiebe et al. (2005). The techniques that were employed to detect the holders are based on labelling the arguments of the verbs with their semantic roles (Swier and Stevenson 2004) or syntactic models (Das and Bandyopadhyay 2010b), emotion knowledge base (Hu et al. 2006), machine–learning-based classification (Evans 2007), etc. The anaphor resolution–based opinion holder identification method also exploits the lexical and syntactic information (Kim et al. 2007).

On the other hand, emotion topic can be defined as the real-world object, event or abstract entity that is the primary subject of the emotion or opinion as intended by its holder (Stoyanov and Cardie 2008a). The topic depends on the context in which its associated emotional expression occurs (Stoyanov and Cardie 2008b). In the related area of opinion topic extraction, different researchers have contributed their efforts. Some of the works are mentioned in Kobayashi et al. (2004), Popescu and Etzioni (2005). But all these works are either based on lexicon lookup or are applied on the domain of product reviews. The opinion topics are not necessarily spatially coherent in social texts as there may be two opinions in the same sentence on different topics as well as opinions that are on the same topic separated by opinions that do not share that topic (Stoyanov and Cardie 2008a). Not only topics, but the relation between sentiment and event can also be identified using the knowledge of lexical equivalence and co-reference approaches (Kolya et al. 2011).

The majority of the existing works in this field have been conducted for English. As far as our discussions on related work mentioned earlier are concerned, to the best of our knowledge, at present, only a few corpora or systems are available for analyzing emotions in languages other than English. In the present chapter, we have considered the Bengali language for our case studies. Bengali is the fifth most popular language in the world, second in India and the national language in Bangladesh. Hence, the present work is also a foray into emotion analysis for an Indian language.

It can be concluded that the perspectives of sociology, psychology and commerce along with the close association among people, topics and sentiments motivate us to investigate the insides in emotional changes of people over topic and time.

3 Approaches to Identifying Emotion Components

There are two main approaches to tackle the issues of emotion analysis: A “rule-based” approach defines markers and linguistic syntactic rules so as to determine the emotions of a text (Mihalcea et al. 2007). It also addresses the semantics associated with emotions by using conceptual resources such as lexicons. On the other hand, a “corpus-based” approach uses an annotated corpus to build a system which identifies emotions. In the corpus-driven approach, the language choice only impacts the corpus selection. However, most successful approaches depend on syntactic rules or text semantics of either keywords or phrases.

3.1 Emotional Expression

It is said that sentiment and/or emotion is typically a localized phenomenon that is more appropriately computed at the paragraph, sentence or entity level (Liu 2009). But emotion analysis can be performed at several levels of granularity, from word, phrase and sentence to document levels. Thus, we can propose the prototype systems of identifying evaluative emotional expressions at different levels of granularities such as word (W), phrase (P), sentence (S) and document (D) (Das and Bandyopadhyay 2010a, 2010f; Das and Bandyopadhyay 2011c).

The baseline system for word-level emotion tagging to measure the performance with respect to each emotion class has been developed in Das and Bandyopadhyay (2010a). Each of the words of the corpus is passed through these six separate modules to tag with the appropriate class label. In addition, the conditional random field (CRF) (Lafferty et al. 2001)– and support vector machine (SVM) (Joachims 1998)–based machine-learning classifiers are also employed for word-level emotion tagging. Different singleton features (e.g., part of speech (POS) of the words, question words, reduplication, colloquial/foreign words, special punctuation symbols, negations, emoticons) and context features (e.g., unigram, bigram) at the word and POS tag level along with their different combinations are used for training and testing. Some of the common features are as follows:

  • Bag of Words: Considers all terms in the corpus and builds a vector per document, where each dimension expresses the presence or absence of the term in the document. This approach can be refined adding the term importance in the corpus.

  • Emotion/Affect Words (EW): The presence of a word in the WordNet Affect lists (Strapparava and Valitutti 2004) identifies the emotion/affect words.

  • Parts of Speech (POS): We are interested in the verb, noun, adjective and adverb words as these are emotion-informative constituents.

  • Bag of features: Some features are chosen, e.g., some units with high frequency in the corpus, such as unigrams, n-grams, POS, adjectives, etc. In order to represent it, usually each feature is put in a dimension of a vector representing the text fragment. In each dimension, usually only the feature presence or frequency is recorded.

  • Intensifiers (INTF): Dependency relations such as amod() [adjectival modifier] and advmod() [adverbial modifier] containing JJ (adjective) and RB (adverb) tagged elements are considered as intensifiers.

  • Direct and Transitive Dependency relations (DD and TD): The direct dependency (DD) is identified based on the simultaneous presence of the emotion word and the other word in the same dependency relation, whereas the transitive dependencies (TD) are verified if they are connected via one or more intermediate dependency relations.

  • Negations (NEG): Dependency relations such as neg_( ) [negation modifier] or the negative words from a manually prepared list (no, not, nor, neither, etc.), e.g., “there’s no way I’m turning it down”, are adopted for the Negations feature.

  • Conjuncts (CONJ): The dependency relation conj_( ) [conjunct] identifies the Conjuncts features (and, or, but), e.g., “But I asked him about WHY he doesn’t cook for himself”.

  • Punctuation Symbols (Sym): Symbols such as comma (,), (!), (?) are often used in single or multiple numbers to emphasize emotional expressions and considered as crucial clues for identifying emotional presence (“I can’t believe she is FINALLY here!!!”).

  • Discourse Markers (DM): The present task aims to identify only the explicit discourse markers that are tagged by conjunctive_() or mark_() type dependency relations of the parsed constituents (e.g., as, because, while, whereas).

  • Capitalized Phrases (CP): A capitalized word or a long capitalized phrase segment, e.g., “I forgot how demeaning BME classes are” or “Terrorists MAKE ME SICK, they ought to all be horrifically detained”, is considered as the Capitalized Phrases feature.

  • Emoticons (emot_icon): The emoticons (, , ) and their consecutive occurrence generally contribute as much as real sentiment to the words that precede or follow them.

Lexical analysis plays a crucial role to identify emotions from a text. For example, the words like love, hate, good, bad, happy and sad directly indicate emotion. But the assumption is true only within a limited context and restricted granularity. Let us consider the following example:

Example 1

Though Mr. Jonathon Read could be generally happy about his car, his wife might be dissatisfied by the engine noise”.

If the sentence would be written by exchanging only the positions of the direct sentiment words (happy and dissatisfied), the whole sentence might lead to a mess:

Though Mr. Jonathon Read could be generally dissatisfied about his car, his wife might be happy by the engine noise”.

Thus, the prime factors for disambiguation coming to our mind are the components that are associated with the emotional expressions. In the example, Mr . Jonathon Read and his car are associated with the emotional expression happy, whereas his wife and engine noise are associated with the emotional expression dissatisfied. It is clear that Mr . Jonathon Read and his wife represent the emotion holders, whereas his car and engine noise represent the emotion topics, respectively.

3.2 Emotion Holder

The baseline model (BM) for identifying emotion holders in English can be designed based on the subject information of the parsed emotional sentences. We can employ the Stanford dependency parserFootnote 3 to accomplish the task. Similarly, for the morphologically rich languages, e.g., Bengali, the sentences are passed through an open source Bengali shallow parser that produces different morphological information (e.g., root, case, vibhakti, tam, suffixes, etc.). The lexical pattern–based phrase-level similarity clues containing different POS combinations, named entities (NEs) and noun phrases are considered for identifying the emotion holders.

Emotion holders can also be identified based on the syntactic argument structure of the emotional sentences. In English, the head of each chunk in the dependency-parsed output helps in constructing the syntactic argument structure with respect to the key emotional verb. Two separate techniques can be used for extracting the argument structure. One is from the parsed result directly, and another is from the corpus that has been POS-tagged and chunked separately. Similarly, the verb-based argument structures are acquired from the chunk- or phrase-level lexical patterns (Kim and Hovy 2006; Choi et al. 2005; Das and Bandyopadhyay 2010b). The pivotal hypothesis considered in the syntactic model (SynM) is based on the hypothesis followed in Banerjee et al. (2010). The hypothesis is that if the acquired syntactic argument structure of a sentence matches with any of the retrieved frame syntaxes of VerbNet,Footnote 4 the holder roles (e.g., Experiencer, Agent, Actor, Beneficiary, etc.) associated with the VerbNet frame syntaxes are then assigned in the appropriate slots in the syntactic arguments of the sentence. For other languages, each acquired syntactic argument structure is mapped to all the possible frame syntaxes present for the corresponding verb in the English VerbNet.

3.3 Emotion Topic

Like emotion holders, the baseline model for identifying emotion topics is developed based on the object-related dependency-parsed relations. The phrase segments containing topic-related thematic roles (e.g., Topic, Theme, Event, etc.) are extracted from the verb-based syntactic argument structures of the sentences. On the other hand, a supervised model (SvdM) is adopted to identify multiple emotion topics along with their individual topic and target spans from each sentence. CRF, SVM and Fuzzy Classifier (FC) are employed by considering various features (e.g., the annotated emotional expressions along with direct and transitive dependencies, causal verbs, discourse markers, emotion holders, named entities and four types of similarity measures—Structural Similarity, Sentiment Similarity, Syntactic Similarity and Semantic Similarity) and their combinations (Kim and Hovy 2006; Das and Bandyopadhyay 2010c). A prototype system for identifying sentiment has been shown in Fig. 1 taken from the article (Kim and Hovy 2006). The incorporation of a special feature, Structural Similarity, that is based on the Rhetorical Structure Theory (Mann and Thompson 1988) improves the topic identification system.

Fig. 1
figure 1

System framework

3.4 Need of Emotion Co-reference

The importance of the emotion-associated components such as holder and topic can easily be verified by mingling their positions and keeping the positions of their corresponding emotional expressions intact. In Example 1, the following variations can be seen:

  1. 1.

    Though his wife could be generally happy about his car, Mr . Jonathon Read might be dissatisfied by the engine noise”.

  2. 2.

    Though Mr . Jonathon Read could be generally happy about the engine noise, his wife might be dissatisfied by his car”.

  3. 3.

    Though his wife could be generally happy about the engine noise, Mr . Jonathon Read might be dissatisfied by his car”.

Thus, it is clear that the proper understanding of the emotion components and their associations is very important if we need to mine emotion properly from texts. For this very reason, document-level emotion classification sometimes fails to detect emotion as it does not account for the individual emotion components. Sometimes, a single topic is co-referred by several users as well as multiple topics are co-referred by a single user. Ekman’s 6 different emotions plotted for 8 different topics and referred by each of the 22 bloggers are shown in Fig. 2 signifying that the user-topic co-reference system performs to generate the emotional views of the bloggers and its dependence on the associated topics. With the above examples and problems in mind, it can be hypothesized that the co-reference among the emotion components will facilitate both the manual and automatic identification of emotions.

Fig. 2
figure 2

Topic-based emotions (in %) of the blog users

3.5 Results

Various experiments regarding symbolic feature, language, and domain-dependent features are carried out for evaluating the word-level emotion classification system. The lexical feature (e.g., POS, words of SentiWordNet and WordNet Affect) outperforms other features significantly. A different combination of context features also shows significant improvement in performance. Though we evaluated all our systems on large amounts of data and the corresponding results are mentioned in different research articles (Das and Bandyopadhyay 2009, 2010a, b, c, d, e, f, 2011b; Das and Bandyopadhyay 2011c), we have reported here only the best performed results on small test data as follows:

The word-level tagging system has demonstrated the average F-Scores of 83.65 % on 1,500 word tokens on English news corpus. The average F-Score of 65 % was achieved for 200 test sentences of SemEval 2007 corpus in sentential emotion tagging. A supervised system for English blogs (Aman and Szpakowicz 2007) outperforms the baseline system and achieves the average F-Scores of 82.72 %, 76.74 % and 89.21 % for emotional expressions, sentential emotions and intensities respectively on 565 gold standard test sentences.

It has been observed that the baseline model for English suffers in identifying emotion holders from the passive sentences. The dependency parser–based method achieves a better F-Score (66.98 %) than the other method (F-Score of 62.39 %) on a collection of 4,112 emotional sentences as the second method fails to disambiguate mostly the arguments from adjuncts. The maximum average F-Scores of the baseline and hybrid systems for emotion topic identification are 56.75 % and 58.88 % respectively on 500 sentences (Table 1).

Table 1 F-Scores (in %) of three emotional components

The supervised multi-engine voting system achieves the F-Scores of 70.51 % and 90.44 % for topic and target span identification respectively from the blog sentences. But the syntactic system suffers in resolving some errors (e.g., appositive cases, co-reference with emotional expression, multiple holders and topics, overlapping topic spans, anaphoric presence of the holders). Thus, some simple rule-based error reduction techniques based on rhetorical structure and emotional expressions are employed in the syntactic system.

It was observed that the error occurs mostly for metaphoric and colloquial usages, unstructured sentences (e.g., “Really starting to lose it”) and sentences containing typographic errors (e.g., “she’s feeling very goooood about herself”). But it is also true that in micro-blogging such as Twitter, the number of misspelled words is even higher due to the 140-character space constraint. In order to test the robustness of the proposed approaches, we would like to incorporate the approaches in order to annotate a Twitter-extracted corpus.

4 Application in Emotion Tracking

4.1 Tracking Bloggers’ Emotions

The blog documents are generally stored in the format as shown in Fig. 3. Each of the blog documents is assigned with a unique identifier (docid#) followed by a section devoted for topics and several sections devoted for different users’ comments. Each comment section consists of several nested and overlapped subsections that also contain the bloggers’ comments. Each of the comment sections of an individual blogger is uniquely identified by the notion of section identification number (secid#). Each section contains the information regarding the identification number of the blog user (uid#) and the associated timestamp (tid#).

Fig. 3
figure 3

General structure of a blog document

If we consider the individual comment section as a separate paragraph that contains several emotional sentences, the emotions present in such individual comment sections represent the emotional state of the blogger at that timestamp. The Referential Informative Chain (RIC) for each of the bloggers is constructed by acquiring the default annotated information like timestamp (tid#), unique identifier (uid#) and emotional comments that are acquired from the nested tree-like structure of the comment sections. The individual RIC is developed for each single blogger with respect to each comment section. Each node of an RIC denotes the emotional state of the blogger at a particular time instance, and the sequence of adding information into the nodes is based on the ascending order of associated timestamps. For example, in Fig. 2, the two nodes, namely, n1 and n2, will be added in front of an RIC developed for the blog user with uid = 1. The associated timestamps (t1, t4) and emotions will also be added into the nodes accordingly. As t4 > t1, the inclusion of node n1 is considered before the inclusion of n2 into the corresponding RIC of uid 1.

The identification of Ekman’s six basic emotions from the bloggers’ comments is carried out at sentence- and paragraph-level granularities. An affect-scoring technique is employed to identify the emotions of a state or node in each of the RICs of the bloggers. Two types of affect scores, Self Affect Score (SAS) and Influential Affect Score (IAS), are used to produce the Emotional Score (ES) of a blogger at each node with a particular timestamp (Das and Bandyopadhyay 2011c).

The changes of a blogger’s emotions are tracked based on the emotions that are assigned to the nodes of the blogger’s RIC. The importance of self and influential affects in tracking results is evaluated using two evaluation techniques, extrinsic and intrinsic. The system achieves precision (P), recall (R) and F-score of 61.05 %, 69.81 % and 65.13 % respectively in case of extrinsic evaluation and the satisfactory average scores of 0.67 and 0.71 for the nominal alpha (Nα) and interval alpha (Iα) in case of intrinsic evaluation (Das and Bandyopadhyay 2011c).

4.2 Sentiment Event Tracking

The temporal sentiment identification from social events has been carried out in Fukuhara et al. (2007). In their task, the authors analyzed the temporal trends of sentiments and topics from a text archive that has timestamps in Weblog and news articles. The system produces two kinds of graphs, topic graph that shows temporal change of topics associated with a sentiment and sentiment graph that shows temporal change of sentiments associated with a topic. Mishne and de Rijke (2006a, b) also proposed a system, MoodViews, to analyze the temporal change by using 132 sentiments used in LiveJournal. With respect to information visualization, Havre et al. proposed a system called ThemeRiver (Havre et al. 2002) that visualizes thematic flows along with a timeline. The temporal relations between events associated with similar or different types of sentiments and visualization of sentiment flows on events based on temporal expressions were carried out in Das and Bandyopadhyay (2011a) by incorporating the knowledge of temporal relations (e.g., AFTER, BEFORE, OVERLAP). The authors proposed a task to identify the temporal relations between the events that occur in two consecutive sentences and to classify the event pairs into their respective temporal classes. Incorporating sentiment property into the set of other standard features, the proposed system outperforms all the participated state-of-the-art systems of TempEval 2007. The positive or negative coarse-grained sentiments as well as Ekman’s six basic universal emotions or fine-grained sentiments are assigned to the events. Based on the temporal relations, the events from each of the documents are represented using a directed graph that shows the shallow path for identifying the sentiment changes over events. An individual separate graph is generated for each of the documents of TempEval-2007 event corpus. The sentiment of each sentence is assigned to its containing event, and each event is represented using a graphical node. The event nodes that are of similar sentiments are connected to their corresponding sentiment hubs based on their annotated sentential sentiment tags.

The tracking of sentiments includes the sentiment twist and sentiment transition. The sentiment twist is the change of sentiment between two consecutive events, whereas sentiment transition is among more than two sentiment events (as shown in Fig. 4). The sentiment change or tracking of sentiments is identified from the AFTER, BEFORE and OVERLAP temporal relations. The ambiguities of the OVERLAP relation are identified by the notion of BEFORE-OR-OVERLAP and OVERLAP-OR-AFTER relations. The number of instances of the sentiment transitions is less than the number of instances of the sentiment twists in the TimeML corpus. Hence, the sentiment transition or tracking of sentiment is identified based on the sentiment twists of the intermediate event pairs in an event chain.

Fig. 4
figure 4

Sentiment tracking among three sentiment events (e28, e17 and e23)

5 Conclusions

This chapter gave an introduction to emotion analysis, its foundations and representative state-of-the-art approaches. In addition, the main problems and issues are discussed. Due to many challenging research problems and a wide variety of practical applications, it has been a very active research area in recent years.

We mainly focused on feature-based emotion analysis, which exploits the full power of an abstract model being built. Results of an implemented medium-size prototype are reported in terms of usual metrics such as F-score, accuracy, etc.

Finally, it is important to highlight that all the emotion analysis tasks are very challenging from the perspective of social computing, and our understanding and expertise of the different arisen issues are still limited. The main reason is that it is a natural language processing task, and natural language processing has no easy problems. In addition, the most effective machine-learning algorithms (i.e., CRF, SVM) still produce no human-understandable results, such that, although they may achieve improved accuracy, we know little about how and why, apart from some basic knowledge gained in the manual feature selection process.