Keywords

1 Introduction

Emotions have been widely studied in psychology and behavior sciences, as they are considered as an important element of human nature. It represents the psychological state of a person which is normally based on internal factors such as mental and physical status of a person and external factors say, social sensory feeling [15]. Identifying emotions from natural language texts has drawn the attention of several information processing communities since, it plays a vital role in human intelligence, decision making, social interaction, awareness, learning, creativity, etc,. Analysis of the emotional content in text, determines opinions, attitudes, evaluations and inclinations. This has focused on recognizing positive and negative orientation of a person with respect to various topics. Also, researchers have focused in the field of human computer interaction namely facial expressions studies, recognition of emotions using sensors, opinion mining and market analysis, etc. Recent online chat systems and blogs are considered as information repository of text with emotional contents. Future human-computer interaction is expected to emphasize the naturalness and effectiveness by integrating the models of human cognitive capabilities that includes emotional analysis and generation. Several efforts have been made by the natural language processing researchers to identify emotion at different level of granularities say word, sentence or document [5][1][2] using reviews, news, question answering, information retrieval, etc. A model is proposed in [9] to estimate the emotions in text by considering the relations among words in a sentence and uses symbolic clues as well as natural language processing techniques for word/phrase/sentence level analysis. In [8], both supervised and unsupervised machine learning classification techniques has been proposed on blog data for comparative evaluation. Here, Support Vector Machine(SVM) has been used to identify the intensity of the community mood. In [11], a corpus of short stories, which are manually annotated with sentiment tags has been used for automatic emotion based classification of sentences. The above literatures focus on the genre of fiction with only sentence-level emotion annotations and they do not identify emotion indicators within a sentence. In [6], an approach is proposed by considering semantics in the text to identify emotions at the sentence level using real-world knowledge from a commonsense knowledge base. The sentences that contain some emotional information are extracted from the knowledge base. Later, this information is utilized in building emotional models of text, which are used to label each sentence with a six-tuple that corresponds to Ekman's[4] six basic emotions. Identifying emotion understanding the importance of verbs and adjectives has been proposed in [14], which is topic and genre independent. Here, each post from a blog has been classified as objective, subjective-positive and subjective-negative. Yahoo! Kimo Blog has been used as corpora in [3] to build emotion lexicons. Emoticons were used to identify emotions associated with textual keywords. A system has been proposed for classifying news articles according to the reader's emotions [7]. Emotion classification task on web blog corpora using SVM and CRF machine learning techniques is carried out. It has been observed that the CRF classifiers outperform SVM classifiers in case of document level emotion detection. In [10], characterization of words and phrases according to their emotive tone has been described. The system classifies the reviews into two types, namely recommended and not recommended using the semantic orientation of the phrases in the review. However, in many domains of text, the values of the individual phrases may bear little relation to the overall sentiment expressed by the text. In [2], emotions are extracted based on WordNet Affect list and dependency relations using intensities. The SVM based supervised framework is employed by incorporating different word and context level features. In [1], emotion analysis on blog texts has been carried out on the English SemEval 2007 affect sensing corpus containing only news headlines. Conditional Random Field (CRF) based classifier has been applied for recognizing six basic emotion tags for different words of a sentence. A score based technique has been adopted to calculate and assign tag weights to each of the six emotion tags. Since, emotion is subjective entity and a sentence may have multiple emotions, classifying the sentence based on the mood is a hard task and above mentioned approaches in sentence classification achieve only modest performance in this domain. Most of the above discussed machine learning based models have considered sentence as their basic key constituent whereas our proposed approach deals with word and phrase in sentences for fine grained pattern analysis.

Based on the above discussion, it is observed that the words in sentences play an important role in tracing the emotions and to find the cues for generating such emotions. However, in many text domains, the phrases are given less weightage in the sentences. In our approach, like words, phrases are considered as the semantic units for emotional expressions and are used in identifying emotional patterns at sentence level. We mainly focus on the characteristics of Emotional triggered (ET) terms and the role of co-occurrences of ET term in the phrase for sentential emotion recognition and patterns that effectively contributes for positive and negative emotions in a sentence. Here, the proposed approach considers the POS features of ET terms and its co-occurrence terms. A supervised framework is employed for classifying the sentences into positive and negative emotional patterns. The proposed approach performs well and achieves encouraging results in obtaining emotional expressions, positive and negative emotion patterns on benchmark dataset. The rest of the paper is organized as follows. The proposed work is presented in the next section and the experimental results are presented in Section 3. The paper is concluded in the last section of the paper.

2 Proposed Work

Emotion analysis is considered as a pattern identification problem at sentence level. The main objective of the proposed approach is to identify the patterns of emotions with respect to positive and negative orientation at the sentence level. The sentences are constructed with large number of terms and only certain terms represent the emotions. These terms project the degree of emotional constituents along with other surrounding related hints and referred as emotional triggers (ET). We consider emotion triggers and Part Of Speech (POS) tags such as adverbs (RB), adjectives (JJ), verbs (VB), nouns (NN), intensifiers (INTF), negations (NEG), interjections (IJ) and conjunctions (CJ) as the baseline for our work.

2.1 Extracting Emotional Triggered Sentences

We have considered six basic emotions proposed by Ekman [4] in our work such as happy, sad, anger, disgust, surprise and fear. These emotions are also represented in the form of facial expressions. First, we have generated a list of seed words commonly used for six basic emotions. The emotion word lists like WordNet Affect lists[13] and word dictionary based thesaurus in English have been utilized as a resource for analysis. These lists are used to extract the emotion triggers (ET) present in the expressions that in turn contribute for identifying emotion patterns at sentence level. The dataset of emotional text shared task on news headlines at SemEval 2007 [12] is used for analysis. The corpus has news headlines that are extracted from news web sites such as Google news, CNN and other newspapers. The training dataset with 253 sentences are considered for standard pre-processing steps, which includes tokenizing, stemming and removal of stop words. The terms are stored in the inverted index and for each term <t> in the inverted index, there is a posting list that contains sentence id and frequency of occurrence <s, f>. Let S be a set of sentences and T be a set of terms present in S. This may be treated as a labeling approach denoted as follows.

$$ l: T\times S\to \left\{ True, False\right\} $$
(1)

The inverted index consists of ET terms as well as other terms. From Eq. (1), it is assumed that a term t ∈ T present in a sentence s ∈ S, if l : (t, s) = True. A sample posting list in retrieval applications is extracted from the inverted index and is in the form of < t, b, s, p >, where p is position of term t in the sentence s in blog b. Since, a term can be physically appearing in sentences of blogs, given ET, such that ET ⊆ T and is defined as the relationship of < ET, b, s, p > as follows and is represented in Eq. (2).

$$ {C}^D(ET)=\left\{< ET, b, s, p>\Big| b\in B, ET\subseteq T\; and\kern0.24em l:\left( ET, b\right)= True\right\} $$
(2)

Identifying an emotional sentence containing a single emotion is easier than identifying a sentence having mixed emotions. Hence, the positional information about terms is considered to identify emotions in long expressions. The terms in the inverted index are matched with emotional word list and the terms that match are considered. The corresponding sentences are extracted from the corpus and referred as ET sentence. Sentences which do not have emotional expressions are referred as neutral sentence. We consider only ET sentences and sentence repository is built, which have sentences that belongs to various emotions. The Fig. 1 shows inverted index and sentence repository and the input is Web blogs. The content of inverted index and the output of the sentences with ET terms are depicted.

Figure. 1
figure 1

Inverted index and sentence repository

The ET sentences of repository are passed through the Stanford Parser (http://nlp.stanford.edu/software/tagger.shtml), a probabilistic lexicalized parser containing 45 different POS tags from the Pen Treebank tag set. It contains 36 POS tags and 12 other tags. Table. 1 presents the POS tagged sample sentences.

Table 1 POS tagged sentences

The size of sentence repository is huge and is difficult to understand the POS tagged sentences patterns. Each sentence pattern has different meaning based on the context with respect to emotions. The ET sentences are considered for sentence level classification based on their POS features as shown in Fig. 2.

Figure. 2
figure 2

Sentence classification based on emotion

In our approach, we have considered four significant POS tags such as adverbs, verbs, adjectives and nouns, which can hold ET terms in the sentences. These POS tags are considered as base POS tags and their extensions such as comparatives, superlatives, etc, are considered to belong to the base POS tags. For instance, adjective comparative (JJR), adjective superlative (JJS) belongs to their base POS tag adjective (JJ). Along with them, intensifiers, negations, interjections, conjunctions are also used. Using these Tags, the sentence classification is done in various levels and is depicted in Fig. 2. In the first level of classification, the POS feature of ET tokens are considered, which can be appeared as noun/verb/adjective/adverb in the sentences. In the second level, immediate co-occurrence terms of the ET tokens (ET ± 1) are considered and verified for the presence of intensifiers (very, really, so etc) since, these terms adds force to the meaning of verbs, adjectives and adverbs by modifying exclusively. Intensifier enhances and gives additional emotional context to the word it modifies. (Eg. It's absolutely amazing). In the third level, co-occurrence terms of the ET tokens (ET ± n) are considered for the presence of negations (not, neither, none, etc,). Negation words are specific words that express a negative idea to influence reader. Presence of negation type words greatly influence other associated words in the sentence (Eg. I was happy so much that I could not control). In the fourth level, co-occurrence terms of the ET tokens (ET ± n) are considered for the presence of conjunctions (and, or, but, etc) in the sentence. Conjunction words joins the sentence parts and they can appear anywhere in the sentence. In general, conjunctions are well used in the long expressions that have mixed emotions (Eg. she was surprised, but not happy about the gift). In the last level, interjections are considered for the classification. Interjections are exclamatory words that express emotions (Eg. wow ! Look at the sunset!) These words can be placed before or after a sentence (ET ± n) followed by exclamation mark or punctuation mark. Finally, the proposed Tag based approach hierarchically classifies the sentences into 16 classes and their name is represented in Table 2. In case of long expressions sentences having mixed emotions, we consider the phrase of first ET token of the sentence for classification.

Table 2 Types of Sentence classes

2.2 Assigning degree of intensities to emotional patterns

It is also important to analyze the positive and negative emotions, which plays vital role in analyzing the psychology of a person. The positive and negative orientation in the expression occurs due to the effect of the emotions. The degrees of emotion contents in these patterns are captured by suitably categorizing them as classes. For Instance, the intensity of ALNIES class is Negative Emotions Very High(N_EVH). The patterns of this class have long expressions(CJ) consisting of phrases that gives additional force (INTF, IJ) to emotions(ET) and exposes negative orientation(NEG) of the expression. The instance here, explains mixed emotion which concludes the expression in negative emotion. Likewise, the intensities are interpreted for other categories and their patterns for negative and positive emotions and is shown in Table 3.

Table 3 Intensities distribution for negative and positive emotions patterns.

3 Experimental results

For experiments, SemEval dataset is considered for performance evaluation, since, it consists of both training and testing benchmark dataset. The details of the dataset is presented below in Table 4, where the number ET sentence is identified in the training dataset. The ET sentence contains both single and mixed emotions. The mixed emotions are sorted from single emotions by recognizing multiple ET tokens and conjunction POS tags.

Table 4 Details of the SemEval dataset

We used manual annotation to judge the patterns into 16 classes which is a tedious and vital process. These classification annotations are done by a group of graduate and research students using NLP tool. Later, the training dataset is learned by Artificial Neural Network (ANN) by giving the POS features of ET sentence along with their intensities as input. The output of the ANN gives 16 classes of emotion patterns which contains various degrees of emotions in their phrases/expressions. The richness of the positive and negative emotion content are represented by the patterns. The outliers are obtained due to the presence of mixed emotions in the sentence, which conflicts and misleads the classifier during classification. We observed that difference in the classification of classifier and manual annotation is less, which lies in the range of 2-7 % for each pattern. The classification accuracy is calculated to estimate the performance of the classification based on the proposed approach using Eq. ( 3 ) and the results are given in Table 5. It is observed that classification accuracy of the classifier and human annotated classification match above 75 % for a sample dataset.

Table 5 Classification accuracy between human annotation and neural network
$$ Classification\kern0.24em Accurac{y}_n=\frac{Sentences\kern0.24em correctly\kern0.24em class ified\kern0.24em by\kern0.24em the\kern0.24em class ifier\kern0.24em to\kern0.24em class\kern0.24em type\kern0.24em n}{Human\kern0.24em annotated\kern0.24em sentences\kern0.24em of\kern0.24em class\kern0.24em type\kern0.24em n} $$
(3)

Later, we used testing dataset to evaluate the performance of the proposed approach for measuring the positive and negative emotions. The 10-fold cross validation is used to evaluate the precision, recall and F1-measure for the patterns. The average results of positive and negative emotion classes is evaluated and compared with other approaches proposed by [1] and [2] and represented as shown in Table. 6.

Table 6 Performance evaluation(%) for SemEval dataset using 10-fold cross validation

Based on the results presented in both Table 5 and 6, it is observed that the proposed sentence level pattern model captures the emotional information effectively to analyze the persons psychology. The performance of the proposed approach is encouraging when compared with other similar approaches.

4 Conclusion

The proposed approach identifies patterns of emotions based on POS features of emotion triggered terms and its co-occurrence terms in the expression. The expressions are classified based on the POS patterns. The generated patterns of classification are analyzed and grouped into the positive emotions and negative emotions. Later, the intensities are assigned for capturing the degree of emotions that exist in semantic of expression. Further, neural network is used as machine learning tool to learn the patterns of positive and negative emotions which captures the psychology of a person. The performance of the proposed approach is encouraging when compared with other similar approaches.