Hybrid Deep Learning Models for Thai Sentiment Analysis

Pasupa, Kitsuchart; Seneewong Na Ayutthaya, Thititorn

doi:10.1007/s12559-020-09770-0

Hybrid Deep Learning Models for Thai Sentiment Analysis

Published: 04 March 2021

Volume 14, pages 167–193, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cognitive Computation Aims and scope Submit manuscript

Hybrid Deep Learning Models for Thai Sentiment Analysis

Download PDF

1459 Accesses
16 Citations
Explore all metrics

Abstract

Many people use social media in their daily life for entertainment, business, personal communication, and catching up with friends. In social media marketing, sentiment analysis is one of the most popular research topics because it can be employed to perform brand or market research monitoring and to keep an eye on the competitors. Machine learning algorithms have been utilized to carry out the task. In addition, sentiment analysis is essential in cognitive computing. Currently, there are still a limited number of Thai sentiment analysis research. This paper proposes a framework for sentiment analysis in Thai along with Thai-SenticNet5 corpus. The framework employs different types of features, namely, word embedding, part-of-speech, sentic features, and all combinations of these features. Furthermore, we fused deep learning algorithms—convolutional neural network (CNN) and bidirectional long short-term memory (BLSTM)—in different ways and compare it to several other fused combinations. Three datasets in Thai were used in this work: ThaiTales, ThaiEconTwitter, and Wisesight datasets. The experimental results show that combining all three features and fusing deep learning algorithms were able to improve overall performance. The best hybrid deep learning was BLSTM-CNN that achieved F₁-scores of 0.7436, 0.7707, and 0.5521, on ThaiTales, ThaiEconTwitter, and Wisesight datasets, respectively. According to the experimental results, we conclude that feature combination and hybrid deep learning algorithms can improve the overall performances.

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

Text-Based Sentiment Analysis Using Deep Learning Techniques

Performance Analysis of Hybrid Architectures of Deep Learning for Indonesian Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Social media connects us with other people, sharing aspects of our life that we value. Unarguably, this means of communication has become a part of one’s life in this digital age. In the past decade, the number of social media users has increased tremendously, so has the number of platforms that have been designed and developed to better suit users’ needs. These include social network, microblog, Web forum, photo/video sharing site, etc. The blooming of social media brings an overwhelming amount of data to the network, combinations of text, image, video, etc. People share their experiences and opinions through these means. Therefore, such raw data can provide useful information when extracted. Because of these reasons, social media has become the subject of interest of many researchers [1].

At present, a large amount of raw data in social media appears in the form of text. It is impossible for a human to analyze all of those text messages and manually extract useful information from them. In order to achieve this goal, computational analysis is required to manage the massive data. A specific technique known as natural language processing (NLP) has long been employed to enable computer to understand human languages.

Many companies have invested buckets of their money in digital media marketing, hoping to increase their sales volume [2]. To ensure the worthiness of the budget spent, they also use social monitoring as a tool to trace customers’ satisfaction and other feedbacks as well as keep track of customers’ desire of new services and/or products. To process social monitoring, NLP technique has been applied for sentiment analysis of the text messages that customers have provided. The findings from the analysis allow suppliers to gain better insights into customers’ attitudes, either positive, negative, or neutral, toward their products. In particular in the case of a negative comment, quick response and prompt action from suppliers can demonstrate their trustworthiness. Thus, analysis of customers’ sentiment is necessary not only for improving the products/services but also for the credibility and good image of the companies. Many commercial social media marketing tools have been developed, such as ZocialEye^{Footnote 1} and Evolve24,^{Footnote 2} to assist the companies in keeping track of their customers [3]. More details on sentiment analysis can be found in [4].

At the present, the deep learning model is the predominant technique for constructing prediction models in various fields of study including NLP. Various deep learning algorithms have been adopted to deal with different types of data. For example, researchers use convolutional neural network (CNN), the prevailing algorithm for computer vision and text analysis, to extract local structure while resorting to long short-term memory (LSTM) and bi-directional LSTM (BLSTM) to manage sequential data and various linguistic patterns, respectively [5,6,7]. Furthermore, previous works have suggested that the quality of a prediction model can be improved when multiple features are integrated into an analysis [8,9,10]. This is because each feature can be complimentary to each other. Because of this reason, Pasupa and Seneewong Na Ayutthaya (2019) incorporated word embedding, part-of-speech (POS), and sentic features into various deep learning models that represent words as vectors, identify POS, and associate words with feeling, respectively [11]. This study’s findings reaffirm the aforementioned claim—integrating multiple features can enhance performances—by performing sentiment analysis on Thai children’s tales. Apart from combining features, this study also compared the efficiency of various deep learning models. The comparison showed that CNN is the most efficient model for sentiment analysis. In addition, Seneewong Na Ayutthaya and Pasupa (2018) also attempted to fuse deep learning models, BLSTM and CNN, in order to, firstly, examine sequences of words and, secondly, explore local features of the text [12]. It was found that the fusion of deep learning models resulted in a higher accuracy of the sentiment analysis. However, these two works were only on a single dataset of 40 Thai children’s tales. It should be noted that most Thai sentiment analysis research studies mentioned above and in the section “Related Works” were conducted on only one dataset and undertook different pre-processing steps. Consequently, it is hard to compare all of them together and reveal the most efficient way to construct models appropriate for sentiment analysis research because of those different experimental frameworks. Additionally, deep learning models have already been fused in various ways, e.g., [13,14,15]. All of those works did not compare their combined networks against all of the others but only against individual ones. Thus, models hybridized in different manners should be compared against each other.

In this paper, we aimed to construct a Thai sentiment analysis framework that fuses CNN and BLSTM in different ways on three datasets. According to our literature review, there are a number of research gaps that have been addressed in this work. Our contributions are as follows:

1.
To the best of our knowledge, this is the first Thai sentiment analysis study which drew its findings from more than one set of data. Precisely, we performed sentiment analysis on three different datasets that had been collected from different sources: (i) Wisesight dataset collected from various social media platforms such as Facebook, Twitter, and Web forum; (ii) Thai Economy Twitter dataset collected solely from Twitter; and (iii) 40 Thai children’s tales. The first two datasets were from social media while the latter was from the literature. Hence, the writing styles were different.
2.
To effectively analyze human sentiment, we propose that instead of relying on only one feature, a combination of features—word embedding, POS, and sentic features—should be incorporated into the analysis to increase the accuracy of human sentiment predictions. This is to strengthen the statement of our previous work [11].
3.
According to the literature, there have been different approaches to fusing models, but the hybrid models were only compared against other individual models running on different datasets without comparing them to each other. Therefore, we compared different combinations of deep learning techniques, for example, CNN-BLSTM, BLSTM-CNN, BLSTM+CNN, and BLSTM×CNN on the same framework and with several different datasets. We demonstrated that among different combinations of deep learning techniques, BLSTM-CNN generated the most reliable results and is thus the best method for doing Thai sentiment analysis.
4.
The current Thai-SenticNet used in this study is the latest version that we have updated from the previous Thai-SenticNet2, proposed in [16]. Unlike SenticNet2, this corpus drew its information from SenticNet5 and includes more words. To achieve this goal, it incorporated LEXiTRON [16], Volubilis [17], and Thai-Wordnet [18] to render the translation of texts between Thai and English languages. As a result, we successfully constructed the Thai-SenticNet5.

This paper is arranged as follows: “Related Works” describes the papers related to our work in the literature. The proposed framework is explained in “Proposed Framework” that shows the data pre-processing step, feature extraction process, and hybrid deep learning models. Section “Experimental Framework” describes the experimental setup and datasets, followed by the results and discussion in “Results and Discussions.” Finally, we conclude our work in “Conclusion.”

Related Works

Many researchers have investigated sentiment analysis in various types of learning problems, such as supervised learning [11, 12, 19], unsupervised learning [20], semi-supervised learning [21], and reinforcement learning [22]. Regarding sentiment analysis, most studies following this line of research have been conducted with texts in English [23, 24]. Few researches were on other languages, e.g., German, French, Japanese, and Chinese [25,26,27,28]. Recently, a framework for multilingual sentiment analysis called BaBelSenticNet was proposed [29]. It translates SenticNet corpus via statistical machine translation tool into 40 languages based on WordNet and its multilingual version. As for Thai, even though sentiment analysis was introduced to examine Thai texts a decade ago [30], the number of Thai sentiment analysis researches is still limited because to analyze Thai text effectively, a limiting factor is that multiple pre-processing steps are required. Hence, it is a challenge for researchers to create Thai NLP to deal with the lack of word delimiter and sentence boundary marker, nostalgic Thai slang, etc. To do so, a fine-grained corpus of Thai language that enables researchers to identify word segmentation, tag part-of-speech, manage named identity recognition, and analyze syntactic parsing, among others, is essential. Unfortunately, up until now, we still have inadequate tools and limited resources supporting Thai sentiment analysis [31]. In 2010, Thai text sentiment analysis was first conducted using term of frequency as an input feature by [32, 33]. Since then, Thai sentiment analysis research continued [34,35,36,37,38]. Most of the studies rely on corpora that applied dictionary-based technique to create feature extraction. In those corpora, words are categorized into three groups, positive, neutral, and negative, and are tagged with either − 1, 0, or + 1 label. However, the 3 labels are, in fact, insufficient to describe human sentiment. Therefore, Lertsuksakda et al. (2014) proposed using a corpus that incorporates a better defined weight, ranging from − 1 to + 1, in Thai sentiment analysis [16]. The corpus was constructed based on SenticNet2 proposed by Cambria and his colleagues [39]. To translate English terms into Thai and verify the Thai meanings gained from the translation process, this study adopted a bi-directional translation technique. Then, sentic features were extracted from sentences and used to analyze sentiment in Thai children’s tales [19, 40].

Deep learning has played a major role in sentiment analysis tasks. An important feature used with deep learning is word embedding which is a feature that transforms words into vectors. Each dimension of this kind of vectors represents a meaning or a context of the word. Word embedding can be done by a Word2Vec model [41]. Besides word embedding feature, there have been several more features used in sentiment analysis tasks, features such as term frequency, POS, and sentic. One of our previous works used combinations of word embedding with other features such as POS tag feature that identifies the grammatical type of a word in a sentence and sentic feature that represents the emotion of a word in vector form [11]. Those combinations truly improved the performance of our model for sentiment classification in Thai tales. Also, consolidating sentiment information into text embedding process can obtain better representations for sentiment analysis [42].

Conventional deep learning models are such as CNN and LSTM. A CNN model is a feed-forward neural network, a long-time favorite for computer vision task. One of the processes in a CNN model is computation of groups of pixels that are components of image data, but for NLP tasks, it processes groups of words instead [43]. An LSTM model is one of recurrent neural network (RNN) models that can learn sequential data, sequences of words in an NLP task. Normally, an LSTM model learns sequential data in forward direction, but in some cases, a pattern needs to be learned in backward direction, so BLSTM was developed to handle it; BLSTM learns sequential data in both forward and backward directions. Many research studies have shown that BLSTM performed better than LSTM [44,45,46]. CNN, LSTM, and BLSTM models have been used in sentiment analysis tasks. Ouyang et al. (2015) used a CNN model to perform sentiment classification on a movie review dataset [47] and showed that the CNN model was more accurate than shallow classification algorithms such as naïve Bayes or support vector machine [48]. Nowak et al. (2017) compared LSTM against BLSTM models in sentiment classification of an Amazon book review dataset and found that the BLSTM model was more accurate than the LSTM model in this task [49].

Besides research studies using individual models, there have been studies that used combinations of models to improve performance. Wang et al. [13] show that a combination of a CNN model with an LSTM model yielded a lower error measure than individual CNN and LSTM models alone did in predicting the valence-arousal value (dimensional sentiment in numerical form) of Stanford Sentiment Treebank (English language) [50] and Chinese Valence-Arousal Text (Chinese language) [51]. Lin et al. (2017) show that a combination of BLSTM with CNN provided the best performance among BLSTM, CNN, CNN-LSTM, LSTM-CNN, and LSTM in predicting the type of customer feedback in an IJCNLP 2017 Shared Task on Customer Feedback Analysis dataset [14]. Minaee and his colleague [15] show that an ensemble between CNN and LSTM model provided a higher accuracy than individual CNN and LSTM models alone in sentiment classification of an IMDB review dataset [52] and a Stanford Sentiment Treebank dataset.

Sentiment analysis is normally performed at a coarse level, i.e., document or sentence level. Recently, aspect-level sentiment analysis has been proposed [6]. There might be multiple feelings in a sentence, for example, “Bad service but really good food.” In this case, there are two aspects which are “service” and “food.” It is clearly seen that this customer has a positive sentiment toward food but a negative sentiment toward the service of this restaurant. Therefore, aspect-level sentiment analysis aims to understand the sentiment in a certain aspect term. This can be achieved by integrating attention mechanism in learning models [53]. The mechanism imitates human’s attention behavior in reading and focusing on a context word that draws their attention. Wang et al. (2016) employed the attention mechanism on LSTM and proposed a model called Attention-based LSTM with Aspect Embedding [6]. The aspect (target word) embedding is concatenated with word embedding vector and fed into the LSTM layer while it is concatenated with hidden state vector and fed into the attention layer. Ma et al. (2017) proposed an Interactive Attention Network that utilizes two LSTM models to separately learn both context and target words [54]. Then, hidden states of both models are interactively learned through the attention mechanism and combined together. Both researches show that employing attention mechanism in LSTM can clearly improve the overall performance of aspect-level sentiment analysis on SemEval 2014 Task 4 [55].

Proposed Framework

The framework for Thai sentiment analysis in this experiment consisted of 3 main parts: (i) data pre-processing, (ii) feature extraction, and (iii) learning model, as shown in Fig. 1.

Data Pre-Processing

Data Cleansing

To boost the performance of sentiment classification, a text must be passed through a cleansing process to get rid of noise that can affect the performance of other downline processes, especially before the text is processed by a tokenization process. The text input in our work had to be processed in the following ways: (a) changing any English words from upper case to lower case and (b) since the text data used in this experiment, such as Thai Economy Twitter and Wisesight datasets, were collected from social media, there occurred some Uniform Resource Locators (URLs) in the text; many URLs were written with characters and numbers that were long and did not have any useful meanings, i.e., they are quite noisy, so we changed every URL to “xxurl.”

Word Tokenization

Each word in a sentence must be split before it is fed into the model as input. The process that splits each word in a sentence is called word tokenization. The Thai language is not like the English language in the sense that two adjacent English words are separated by a space. Therefore, to split Thai words in a sentence needs a special technique. In this experiment, we split words in a sentence by using a technique based on a maximum matching algorithm from PyThaiNLP Library [56]. Maximum matching algorithm implemented in PyThaiNLP is a dictionary-based approach. The algorithm scans series of input characters and matches them with words in a dictionary [57,58,59]. Then, it employs breadth-first search to select segmented series that contain a minimum number of word tokens. Word tokens that are not in the dictionary will be segmented into Thai character clusters. A Thai character cluster is an unambiguous unit that is smaller than a word. It is an indivisible unit. This process is performed by the character clustering algorithm proposed by Theeramunkong and his colleagues [60]. The algorithm utilizes a set of simple rules based on types of Thai characters. After a word tokenization process, all tokens (except emoticons) are fed into a spell check algorithm proposed by Peter Norvig [61]. The algorithm will find possible permutations of the original words within a two edit-distance (i.e., inserts, replaces, transposes, deletes). Words that have the highest frequency in a list are selected. It is noted that the list contains words which are edited and matched with words in the dictionary.

POS Tagging

In this experiment, we used a POS tagging process to identify the types of words in a sentence needed for construction of POS tag features. POS tagging used a model based on Perceptron Tagger from PyThaiNLP library to do the tagging. This model categorizes types of words into 47 types based on ORCHID corpus which is a Thai POS-tagged corpus [62]. However, 47 types of words seem too complex and can be difficult for a model to learn. Therefore, the 47 types of words from ORCHID categorization were mapped to 17 types of words based on Universal POS tags known as Universal Dependencies (UD) [63]. Those 17 types of words were simple, easily comprehensible, and preferred in many languages. However, in the mapping process, only 15 types of word were mapped because no type of words based on ORCHID could be mapped into two types of UD: “symbol” (SYM) and “other” (X) as shown in Table 1. Please note that the mapping function was a part of PyThaiNLP library.

Table 1 POS mapping between ORCHID and UD POS tags [56]

Full size table

Since social media data often expresses sentiment or emotion of the users with emoticons, we added an “EMOJI” type as one of the POS tag types to identify emoticons in sentences. Emoticon is directly associated with emotion and so beneficial to sentiment analysis [64]. All the emoticons were mapped to their name in English via an emoji library [65]. In addition, we used a padding process, explained in “Padding,” and the words padded by this process would be another “PAD” type. Therefore, we categorized POS tag types into 19 types that were not exactly the same as the original 17 UD types.

Regarding the emoticons, their sentic vector was computed according to its name mapped by an emoji library. In the case that an emoticon’s name was longer than one word, that emoticon sentic vector would be represented by an average across all sentic vectors of all words.

Sentic Tagging

A feature that represents the emotion of a word is called a sentic feature. The sentic value of a word was encoded in a 5-dimensional vector that consisted of four affective dimensions and a polarity as presented by [66]. The emotion represented by each dimension is explained in detail in “Sentic.” Sentic vectors that represent English sentiment lexicon can be obtained from SenticNet, which was now in the 5.0 version. For our use in Thai language, we then needed to construct a new Thai-SenticNet. This Thai-SenticNet was constructed based on the two following concepts: bi-directional translation and Thai-Wordnet.

1.
Thai words were aligned with English words and their alignment was verified with a bi-directional translation [67] technique based on Bi-LEXiTRON [16] and Bi-Volubilis [17] corpora. Furthermore, several words were added from Thai-WordNet [68], Thai words that aligned with English words in Synset^{Footnote 3}.
2.
Constructing new words by deleting some stop words [69] from our corpus (details about the corpus are below) and adding them to the corpus to make it more comprehensive.

In this work, we used the following corpora to map English sentic vectors to Thai sentic vectors: (i) bi-directional LEXiTRON (Thai-English) [16]; (2) bi-directional Volubilis 11K (Thai-English) [17]; (3) Volubilis-100K [17]; (4) Thai-WordNet [18]; and (5) SenticNet5 [70]. Our final corpus contained 23,093 Thai words that had been successfully verified and 15,247 sentic vectors. We called it a Thai-SenticNet5.

The purpose of Thai-SenticNet5 was to accurately map the sentic values of English words in SenticNet5 to the sentic values of corresponding Thai words in our constructed corpus. There were several Thai to English dictionaries or corpora such as LEXiTRON, Volubilis, and Thai-WordNet. We wanted our constructed corpus to contain as many Thai words as possible, so we combined the three Thai corpora mentioned above under the constraint that each translated word had to be successfully verified by the bi-directional translation technique such that the meaning of every Thai word in the corpus would be the same when it is translated into English, then translated back into Thai [16].

First, we considered Thai words in Volibilis-100K which was the biggest corpus in this experiment with 107,607 entries. Then, we put the same words that had more than one entry in the corpus, i.e., listed in different POS sections in this corpus, in one entry, reducing the number of entries or words in the corpus to 100,107 words. Words in this modified Volubilis-100K were not verified by the bi-directional translation technique; therefore, we selected only the Thai words in this corpus and matched them with translated English words from the following corpora that had already been verified by the bi-translation technique:

1.
Thai WordNet corpus created by aligning Princeton WordNet’s Synsets with Thai words by using a bi-lingual dictionary;
2.
LEXiTRON-Volubilis-Bi corpus created by merging LEXiTRON-Bi [16] that had 2871 words with Volubilis-Bi [17] that had 11,065 words. The merged corpus was called LEXiTRON-Volubilis-Bi. Please note that the Thai words in Volubilis-Bi were a subset of the 11,820 entries in Volubilis-100K.

Afterwards, we merged only the Thai words from the modified Volubilis-100K that had 100,017 words with a list of Thai words from LEXiTRON-Bi (we did not merge the Thai words from Volubilis-Bi because they were a subset of the words in Volubilis-100K) and got 100,118 Thai words. Furthermore, to get even more Thai words, we used a technique that delete stop words from each relevant entry in the list and added these entries with deleted stop words into the list and got 119,281 Thai words. Let us call this list ThaiWordList. After that, we started to create a verified dictionary with the two corpora by aligning the Thai words in ThaiWordList with a set of English words in Thai-WordNet and LEXiTRON-Volubilis-Bi corpora. We matched each Thai word in ThaiWordList with a set of English words in Thai-WordNet under the condition that the Thai word had a corresponding English word in synset in Thai-WordNet. If the Thai word did not have any corresponding English word in the synset in Thai WordNet, we matched it with a set of English words in LEXiTRON-Volubilis-Bi instead. Finally, we got 23,093 matches and started to create the Thai-SenticNet5 corpus by mapping a set of English words with the corresponding Thai word to SenticNet5 corpus to get a set of sentic values and an average sentic value for a Thai word. That was how we got the 15,247-word Thai-SenticNet5 corpus with a sentic vector for every Thai word in the corpus.

This construction of Thai-SenticNet5 corpus was shown as pseudocode in Algorithm 1. Table 2 shows the number of verified words and the number of words that had a sentic vector in each of the mentioned corpora. It can be seen that the number of verified words that had an associated sentic vector increased after every step of the construction process. Thai-SenticNet5 is available for download at https://github.com/dsmlr/ThaiSenticNet5.

Table 2 Number of verified words and the number of words that had a sentic vector in each corpus

Full size table

Padding

A set of words had to be transformed into vector data before they were fed into the model. The vector data were chunks of data (batches) and every sample vector in each batch had to be of the same size. Therefore, we needed to pad some words in the set with 〈PAD〉 in order that every sample vector would be of the same size.

Feature Extraction

Word Embedding

Deep learning model is a subset of neural network models. They are mathematical models that cannot learn directly from raw text data. Indeed, it can only learn from vector data; therefore, a word must be transformed into a vector first. This kind of transformation is called word embedding which can be done by models such as Word2Vec, GloVe, and ULMFiT. Conventional Word2Vec models are such as the following two models: Continuous Bag-of-Words model that uses context words (words surrounding the target word) as input to predict the target word and Skip-Gram model that uses the target word to predict the context words [71]. An efficient Word2Vec model should be trained with a large corpus. GloVe model [72] learns word vectors by using information from word co-occurrence probabilities at the global level (whole dataset) and gives good results when it is trained on a large corpus. However, training a model on large corpus consumes a lot of time. Fortunately, there has been a research study that proposed a technique for fine-tuning a language model that can transfer knowledge gained from one task to any other tasks in NLP which means that the language model does not need to be trained from scratch. This technique is called Universal Language Model Fine-tuning (ULMFiT) [73].

The datasets used in this experiment were relatively small, so it was difficult to train and get an efficient model from scratch; hence, we used a pre-trained language model to transform words into vectors. The pre-trained language model was from thai2fit library [74]. It was an ASGD Weight-Dropped LSTM model [75] trained by a ULMFit method on Thai Wikipedia dataset. The number of dimensions of the pre-trained embedded word vectors was 300.

POS Tagging

In this experiment, we used a POS tag feature that represented a type of word in a sentence in one-hot vector form. The number of dimensions of our one-hot vector was related to the number of types of POS. For each POS tag type, the corresponding one-hot vector would have a value of 1 in a dimension while the other dimensions would have a value of 0. We categorized POS tag into 17 UD POS tag types as shown in Table 1 and two additional types—EMOJI and PAD.

Sentic

Sentic feature is a five-dimensional vector composed of four affective dimensions based on Hourglass of Emotion theory. The theory follows psychological principles that rely on activities of the brain while the condition of the mood changes [76, 77]. The four dimensions are sensitivity (Snst), aptitude (Aptit), attention (Attnt), and pleasantness (Plsnt). We were able to calculate the polarity of a word by:

$$ p = \sum\limits_{i=1}^{N} \frac{Plsnt(\kappa_{i}) + \left| Attnt(\kappa_{i}) \right | - \left | Snst(\kappa_{i}) \right | + Aptit(\kappa_{i})}{3N}, $$

(1)

where N is the total number of word concepts (concepts that describe objects or actions perceived) and κ_i is the i-th input concept. Here, p was in the range [− 1,1] implying extremely negative to extremely positive emotion.

This research obtained the sentic values from SenticNet5 corpus. The corpus is a sentiment lexicon at the concept level. It employs BLSTM to infer primitives by lexical substitution. Feature vectors were extracted using Thai-SenticNet5 Corpus as explained in “Sentic Tagging.” For any Thai words that do not exist in the corpus, the feature vectors would be represented by 5-D zero vectors.

Learning Model

Bi-Directional Long Short-Term Memory

One of the powerful algorithms in the RNN family is BLSTM which is able to learn sequential data both in forward and backward directions. It is an extension of LSTM. It has been known that RNN has a gradient vanishing problem for long data sequences [78]. Therefore, LSTM has been introduced to deal with this problem.

LSTM processes data in a forward direction with an ability to remember and forget the information. LSTM model consists of the following components: forget gate (f_t), input gate (i_t), input modulation gate ($\tilde {c_{t}}$), cell state (c_t), output gate (o_t), and hidden state (h_t). Forget gate (f_t) enables the model to be able to reset itself—to forget the old information at an appropriate time. When there is a new incoming sample (x_t), the sample will be considered together with the previous hidden state (h_t− 1) whether how much information should be forgotten. Sigmoid function is employed for this task. Its value ranges from 0 to 1, corresponding to completely forget or remember the previous information, respectively. This can be explained by:

$$ f_{t} = sigmoid(W_{f} [h_{t-1}, x_{t}] + b_{f}). $$

(2)

Input gate (i_t) is a gate that decides which information will be updated, again considering together with the previous hidden state (h_t− 1). The sigmoid function will decide how much new information should be updated based on values of 0 to 1:

$$ i_{t} = sigmoid (W_{i} [h_{t-1}, x_{t}] + b_{i}). $$

(3)

Input modulation gate ($\tilde {c_{t}}$) is similar to candidate cell state that learns both new information (x_t) and the previous hidden state (h_t− 1). It utilizes the $\tanh $ activation function and create a vector of new candidate values:

$$ \tilde{c_{t}} = tanh (W_{c} [ h_{t-1}, x_{t} ] + b_{c} ). $$

(4)

Cell state (c_t) is a long-term memory cell that is a combination of old information (c_t− 1)—that is dropped by a forgot gate—and new information—that is a product of input gate i_t and input modulation gate ($\tilde {c_{t}}$):

$$ c_{t} = f_{t}\cdot c_{t-1} + i_{t} \cdot \tilde{c_{t}}. $$

(5)

Output gate (o_t) has a role to decide what the next hidden state should be. It sends information to the hidden state (h_t) that is restricted to an interval [0,1] by a sigmoid function:

$$ o_{t} = sigmoid(W_{o} [h_{t-1},x_{t}]+b_{o}). $$

(6)

The last component is hidden state (h_t), referred to as an output of LSTM. It carries the information on what LSTM has seen:

$$ h_{t} = o_{t} \cdot tanh (c_{t} ). $$

(7)

On the other hand, BLSTM processes data in both forward and backward directions. The architecture of BLSTM is shown in Fig. 2.

In this study, once a sentence was fed into the model, it went through the embedding layer that converted the sentence into a word embedding feature that was further fed to a dropout layer. Then, it was combined with POS tag and sentic vectors as shown in Fig. 3. Then, we fed the combined features to a BLSTM layer. The hidden states of both forward and backward directions—the last output of BLSTM—would be concatenated and fed to the dropout layer to prevent over-fitting problem [79] before being pushed on to the output layer. Softmax activation function was used to predict output classes as output probabilities range.

Comparing the operation of the BLSTM model to human reading behavior, it would be like reading each word from the beginning to the end of the sentence and analyzing the sentence in both forward and backward directions. This allows human to interpret and analyze the meaning of the sentence including the use of grammar that might contain patterns in both forward and backward directions.

Convolutional Neural Network

CNN is a feed-forward neural network that has, at least, a convolutional layer as a core component that automatically generates feature maps by sliding a filter over an image. Another important component is a pooling layer that is employed to reduce the size of feature map. Therefore, CNN is able to capture local features of text. The architecture is shown in Fig. 4.

An input feature vector is first fed into the convolutional layer that allows the model to learn information from groups of words through a striding filter. A striding or sliding filter has a dimension of w × h where w is the length of feature vector and h is the number of words that the filter covers at a time. This leads to an output with a size of s × n where n is the number of nodes in the convolutional model. s is the number of strides that is equal to h − (l − 1) and l is the number of words in a sentence. Then, the output from the convolutional layer passes through Rectified Linear Units (ReLU) activation function [80]. Because the vector from the input to the output layer has to be 1-D vector, 1-D dynamic max pooling with size of s × 1 is required. It strides for n times and gives a 1-D output vector that goes to the dropout layer then to the output layer.

Hybrid Deep Learning Models

We proposed the following four different hybrids of deep learning models.

BLSTM-CNN

BLSTM-CNN is a hybrid deep learning model that combines CNN to BLSTM. The model aims to first learn sequences of words by BLSTM and capture local features by CNN. The model is shown in Fig. 5. After a sentence is input into the model, it is feature-extracted and sent to BLSTM layer to learn the sequence of the sentence in both forward and backward directions. Then the output from the BLSTM—that has long-range dependency information of both directions—goes to CNN in order to extract local features of text.

CNN-BLSTM

CNN-BLSTM is the other way around. The model aims to first learn local features of text by CNN and then long-range dependency between the sequence of words is learned by BLSTM. The model is shown in Fig. 6. The output from convolutional layer goes to ReLU activation function. Then, the output—that has local features embedded—is fed into BLSTM layer to learn sequences in forward and backward directions. The hidden state of forward and backward directions is concatenated before it goes through the dropout layer and to the output layer.

BLSTM+CNN

This model learns the local features of a sequence of words in both directions at the same time. The model is shown in Fig. 7. An input sentence is feature-extracted then fed into BLSTM and CNN layers. The outputs from both layers are concatenated before they go through the dropout and output layers.

BLSTM×CNN

In this type of hybrid model, we simply ensembled both models by a soft voting scheme. The sentiment probability that BLSTM×CNN predicts is calculated by averaging the probabilities given by BLSTM and CNN. The final predicted sentiment is that which has the highest probability. The model is shown in Fig. 8.

Experimental Framework

Datasets

The proposed hybrid models—BLSTM-CNN, CNN-BLSTM, BLSTM+CNN BLSTM×CNN—were compared with their individual counterparts—CNN and BLSTM—on three datasets.

Wisesight Sentiment Dataset

This set, Wisesight, was collected from public pages in Facebook, Twitter, YouTube, Pantip.com, and other Web forums between 2016 and early 2019. A majority of the topics in this dataset were consumer products and services. There were 26,740 messages that were divided into four classes—6800 of negative sentiments, 14,500 of neutral sentiments, 4700 of positive sentiments, and 500 of queries. The length of each message was between 1 and 428 words. It should be noted that this dataset was labelled by a group of annotators. Each message was given only one label by an annotator. The dataset is available to download at https://github.com/PyThaiNLP/wisesight-sentiment.

Thailand Economy Twitter Dataset

This dataset, ThaiEconTwitter, was proposed by [81]. This set was collected from Twitter. Tweets with two hashtags— (stock) and (economic)—between 17 April 2017 and 5 May 2017 were retrieved. It consisted of 2000 sentences and three classes—positive, neutral, and negative sentiments. Each sentence was given a label by one of three experts. In this work, we selected only a set of sentences that were given the same label from all three experts. Therefore, there were only 1041 sentences comprising 608 negative sentiments, 84 neutral sentiments, and 349 positive sentiments.

The 40 Thai Children’s Tales Dataset

This dataset, ThaiTales, was first used in [19]. The dataset collected from 40 Thai tales consisted of 1964 sentences. Each sentence was labelled as one of three classes, i.e., positive, neutral, or negative sentiment by three experts. All of the experts gave the same label to a sentence for only 1115 sentences consisted of 309 sentences with positive sentiment, 508 sentences with neutral sentiment, and 298 sentences with negative sentiment. The dataset is available for download at https://github.com/dsmlr/40-Thai-Children-Stories.

Experiment Settings

The performance of the proposed hybrid models, i.e., BLSTM-CNN, CNN-BLSTM, BLSTM+CNN, and BLSTM×CNN, was compared with individual models, i.e., BLSTM and CNN. Three types of features were also compared together with their various combinations, i.e., word embedding (F_W), POS tag (F_P), sentic (F_S), F_W + F_P, F_W + F_S, F_P + F_S, and F_W + F_P + F_S. The experiments were conducted on three datasets—ThaiTales, ThaiEconTwitter, and Wisesight—that are explained in the previous subsection. Each dataset was split in a stratified manner into three subsets—training, validation, and test sets—at a ratio of 60:20:20. Hence, all of the subsets inherited some of the same characteristics of the original dataset including class distribution and sentence length distribution. We employed Adam optimizer [82] with a learning rate of 0.001. Every model was trained for 300 epochs on ThaiTales and ThaiEconTwitter datasets and 50 epochs on Wisesight dataset. The reasons behind the setting of 50 epochs on Wisesight dataset were (i) the loss function converged at around the 30th epoch as shown in Fig. 9 and (ii) the number of samples was very large leading to high computational cost. We performed grid searches to tune the many parameters of each algorithm. The search settings were as follows:

– BLSTM:: The number of hidden nodes in BLSTM layer was either {16, 32, 64, 128, 256 or 512}.
– CNN:: The number of hidden nodes in the filter was either {16, 32, 64, 128, 256 or 512} and the filter size was fixed at 3.
– BLSTM-CNN:: The number of hidden nodes in BLSTM layer was {16, 32, 64, 128, 256, 512}, the number of hidden nodes in the filter was either {16, 32, 64, 128, 256 or 512}, and the filter size was fixed at 3.
–:: The cases of CNN-BLSTM, BLSTM+CNN, and BLSTM×CNN were similar to that of BLSTM-CNN.

Dropout layers were employed in all models. They were between the embedding layer and output layer, and their dropout value was set to 0.5. Then, we selected optimal parameters based on F₁-score obtained in the validation process and explained in the following subsection. The optimal parameters were used as settings in the optimal model trained with the combined training and validation datasets, then the optimal model was evaluated on the test dataset. The above process was repeated with 10 different random splits.

Performance Evaluation

As our datasets were mostly imbalanced, we used F₁ as the performance measure. F₁ is a score that seeks to balance between precision and recall. It is calculated from the harmonic mean of precision and recall as follows:

$$ F_{1}= 2\cdot\frac{P\cdot R}{P+R}, $$

(8)

where P is precision and R is recall that can be calculated as (9) and (10), respectively.

$$ \begin{array}{@{}rcl@{}} P&=&\frac{TP}{TP+FP} \end{array} $$

(9)

$$ \begin{array}{@{}rcl@{}} R&=&\frac{TP}{TP+FN} \end{array} $$

(10)

where TP, FP, and FN denote true positive, false positive, and false negative, respectively.

Results and Discussions

The fused deep learning models were evaluated with each of the three different features on the three datasets. Table 3 lists the average F₁-score across ten random splits. We first compared the performance of each individual feature, F_W, F_P, and F_S. The best individual feature was F_W that yielded F₁-score of 0.6576 on average across all datasets, models, and ten random splits. This score is followed by F_P (0.4669) and F_S (0.4598). When two or more features were combined, they improved the overall performance. F_W + F_S yielded the best contender at 0.6653, followed by F_W + F_P + F_S (0.6647), F_W + W_P (0.6593). It can be clearly seen that F_W is the most important feature because the top four contenders from all runs by all models on all datasets always included F_W as an individual feature or in combination with other feature(s), while this was not true for any other features. Combining F_P and F_S improved the overall performance to 0.5294 F₁-score, better than using each of them as an individual feature. Thus, combining feature led to improvement of overall performance.

Table 3 Performance comparison of all models with different combinations of features on three datasets

Full size table

There was variation in the performances shown in Table 3. Eighteen judges ranked the performance of each of the 7 features for every model and dataset (denoted as objects) based on F₁-scores. The ranks are shown in Table 4. The significance of the ranks in the table was tested by using Kendall coefficient of concordance (W ) which turned out to be 0.8137 (p < 0.01 for 6 degrees of freedom). W is particularly useful for testing inter-judge or inter-test reliability [83]. The sum of the rank of every feature indicates the best overall rank of the objects [83] that suggests the following ordering:

$$ \begin{array}{@{}rcl@{}} F_{W}&+&F_{P}+F_{S} \sim F_{W}+F_{S} > F_{W}+F_{P} > F_{W}\\ &>& F_{P}+F_{S} > F_{S} > F_{P}. \end{array} $$

Table 4 Ranks assigned to 7 features by 18 judges (for each of the models and datasets) from Table 3 were based on F₁-score

Full size table

We further employed multiple t test on the results in Table 3 to test the significance level of the difference between the means of two independent samples [84]. The test shows that each of the possible pairwise combinations is very highly significant (p < 0.001) except F_P-vs-F_S (p = 0.5122), F_W-vs-F_W + F_P (p = 0.3239) and F_W + F_S-vs-F_W + F_P (p = 0.7013) which are less conclusive.

According to the test, fusing features were able to clearly improve the prediction performance. Findings from two of our previous works support this observation [11, 12]. For example, they suggest that it is possible that combining F_W and F_P could improve prediction performance. F_W could capture some syntactic information of a word while W_P could directly capture the grammatical type of a word. Pasupa and his colleagues have shown that intransitive verb (vi), transitive verb (vt), adverb (adv), common noun (n), and adjective (adj) were the most affective words that stimulate strong human emotions [19]. In addition, Pasupa and Seneewong Na Ayutthaya showed that simply including POS information for all words in a sentence could improve the prediction performance [11]. It was even better when POS information was included in RNN model with only some selected words based on the five types of POS (n, vi, vi, adj, and adv).

Since Wisesight dataset contained much more samples than the others, we separately evaluated the ranks of the performance that every feature achieved on two different groups of datasets: (i) small-sized dataset group, ThaiTales and ThaiEconTwitter; and (ii) large-sized dataset group, Wisesight. In the analysis of the small-sized dataset group, the ranks of all features led to the value of W = 0.8585 which was significant at p < 0.01. This high value enabled us to report with confidence that the following ranking is valid:

$$ \begin{array}{@{}rcl@{}} F_{W}&+&F_{S} > F_{W}+F_{P}+F_{S} > F_{W}+F_{P} > F_{W}\\ &>& F_{P}+F_{S} > F_{P} > F_{S}. \end{array} $$

In the analysis of the large-sized dataset group, the ranking of all features was:

$$ \begin{array}{@{}rcl@{}} F_{W} &>& F_{W}+F_{P} > F_{W}+F_{P}+F_{S} > F_{W}+F_{S} \\&>& F_{P}+F_{S} > F_{S} > F_{P}. \end{array} $$

This ranking was significant at p < 0.01 with W = 0.8254. According to this ranking, any combinations with F_W are in the top ranks. Combining additional information—sentic and POS—was able to improve the performance of F_W on small-sized datasets. However, on the large-sized dataset, F_W was the best contender. Please note that the good outcomes of combined features were applicable to all models.

Moreover, Table 5 shows F₁-score achieved by every model for all datasets (averaged across all feature sets and ten random splits). Focusing on individual models, BLSTM and CNN—CNN outperformed BLSTM on ThaiTales dataset. Moreover, its performance was better than that of BLSTM for all features as shown in Table 3. However, BLSTM achieved 0.4694 F₁-score, better than CNN did at 0.4183, on the Wisesight dataset. In addition, it also outperformed CNN for all features. This might be because the Thai tales were simple in vocabulary and grammatical structure; therefore, focusing on learning neighboring words—local features—was more relevant than learning the sequences of sentences. On the other hand, users might use some difficult words or complex sentences in social media data. In ThaiEconTwitter, the F₁-score achieved by CNN was 0.6481 on average while that achieved by BLSTM was 0.6502. However, it is inconclusive whether CNN performed worse than BLSTM because its performance was worse than that of BLSTM for only in 4/7 cases.

Table 5 Average F₁ scores of every model on three datasets, averaged across all features and ten random splits

Full size table

The best performer was BLSTM-CNN that achieved 0.6100 F₁-score on average. Combining two models improved the performance in most cases, except for BLSTM×CNN on Wisesight and ThaiTales, and CNN-BLSTM on ThaiTales. However, the overall performances of the fused models were better than individual ones on average.

Considering only the fused models, it is clear that BLSTM-CNN was better than CNN-BLSTM in all cases (all features) on ThaiTales and ThaiEconTwitter, while it outperformed CNN-BLSTM in only 1/7 cases (F_P + F_S) on Wisesight as shown in Table 3. Nonetheless, CNN-BLSTM was the worst combination in all cases in ThaiTales and 5/7 cases in ThaiEconTwitter. Learning with concatenated features and with BLSTM and CNN (BLSTM+CNN) yielded better performances than learning by voting scheme with simple ensemble of both models (BLSTM×CNN) in all cases on Wisesight dataset. On the other hand, BLSTM×CNN performed better than BLSTM+CNN in 5/7 cases on ThaiEconTwitter and 6/7 cases on ThaiTales. The best combination on Wisesight was CNN-BLSTM (for 6/7 cases) and the worst was BLSTM×CNN (for all cases).

We further investigated the reason why BLSTM-CNN performed worse than CNN-BLSTM only on Wisesight. Figure 10 shows sentence length (number of words in an item) distributions of all three datasets. The sentence length distributions of Wisesight and ThaiEconTwitter datasets were clearly skewed to the right; i.e., the sentences in these two datasets were mostly short with a mode value of 5. On the other hand, such distribution of ThaiTales was close to normal (unskewed) with the mean of 16.84, the median of 15, and the mode of 15. According to the figure, the length of sentences in Wisesight varied from 1 to 428 words. Its range was much wider than those of the other two datasets because the longest sentence in ThaiEconTwitter was 74 words and 68 words in ThaiTales.

Because of this variation, we divided each range into 10 equal intervals and tested the samples in each interval separately for each dataset. Then, the F₁-score for every interval was reported on all datasets as shown in Fig. 11. BLSTM-CNN performances were better than those of CNN-BLSTM in all cases on ThaiTales and ThaiEconTwitter. On Wisesight dataset, CNN-BLSTM achieved better performances than BLSTM-CNN only for three intervals—[0–43], [87–129], and [173–215]—but the percentages of the samples in these three intervals were 88.08%, 1.94%, and 0.25%, respectively. Therefore, the overall performance of BLSTM-CNN dropped by a bit, meaning that CNN-BLSTM performed better than BLSTM-CNN for short sentences, while BLSTM-CNN outperformed CNN-BLSTM for long sentences on Wisesight dataset.

As there was a degree of variation in the ranking of the models, the significance test, Kendall coefficient of concordance, for evaluation of ranking of the models was again conducted. The six models were assigned ranks by 21 judges (across all features and datasets) as shown in Table 6. The computed W was 0.3263. This value was significant at p < 0.01. Given that the significant level of agreement between various rankings of the same set of models has been established, the best overall order of the model was based on the sum ranks. This gives the following ranks:

$$ \begin{array}{@{}rcl@{}} \text{BLSTM-CNN} &>& \text{BLSTM}+\text{CNN} > \text{BLSTM}\times\text{CNN} \\&>& \text{CNN-BLSTM} > \text{CNN} > \text{BLSTM}. \end{array} $$

Table 6 Ranks assigned to 6 models by 21 judges (across all features and datasets) according to their F₁-scores listed in Table 3

Full size table

Again, a multiple t test was conducted to test the significance level of the differences between the means of F₁-scores achieved by pairs of models. We tested every possible pair of all models. The test showed that the differences for all pairwise combinations were very significant (p < 0.001), except for those for BLSTM-vs-CNN (p = 0.3354), CNN-vs-CNN-BLSTM (p = 0.1779), and BLSTM+CNN-vs-BLSTM×CNN (p = 0.2866) which were less conclusive.

In addition, the ranking of every model on the small size datasets led to a value of W = 0.6257 (p < 0.01), giving the following overall ranking:

$$ \begin{array}{@{}rcl@{}} \text{BLSTM-CNN} &>& \text{BLSTM}\times\text{CNN} > \text{BLSTM}+\text{CNN} \\&>& \text{CNN} > \text{BLSTM} > \text{CNN-BLSTM}, \end{array} $$

while the ranking of every feature on the large dataset was

$$ \begin{array}{@{}rcl@{}} \text{CNN-BLSTM} &>& \text{BLSTM-CNN} > \text{BLSTM}+\text{CNN} \\&>& \text{BLSTM} > \text{BLSTM}\times\text{CNN} > \text{CNN}. \end{array} $$

The computed value of W was 0.9005 on the large dataset—significant at p < 0.01.

According to [85], the bigger is the sentiment lexicon, the better is the prediction accuracy. Therefore, we plotted the average ratios of the number of words with sentic value to the number of words in the whole sentence for all datasets against the average F₁-scores across all models and features for the test sets, shown in Fig. 12. There were no significant correlations between such F₁-scores and ratios for each dataset and across all datasets. Such ratios between ThaiEconTwitter and Wisesight datasets were along the same line but the F₁-scores were far different. On the other hand, such ratio for ThaiTales dataset was the biggest among the three datasets but such F₁-scores for it were smaller than those for ThaiEconTwitter which also had a smaller value of such ratio.

In addition, we show the confusion matrices of BLSTM-CNN in conjunction with F_W + F_P + F_S for each dataset as this combination of classifiers was the best contender according to our analysis, as shown in Fig. 13. Regarding misclassified samples in ThaiTales dataset, the model had a tendency to predict negative samples to neutral rather than positive samples. In addition, positive samples were misclassified as neutral samples. This shows that the classifier tended to predict the model toward the majority class. This observation is also applicable to the remaining datasets. The majority class of ThaiEconTwitter was negative class. Those misclassified samples of positive and neutral samples were classified as negative class. It should be noted that the model performed well in predicting each class on ThaiTales and ThaiEconTwitter datasets, but did not do so on the Wisesight dataset. Most misclassified samples were classified as neutral as expected, i.e., toward the majority class. The model correctly classified 4741 positive samples from the total of 9820 positive samples (48.3%), while it classified 4165 positive samples as neutral samples (42.4% of positive samples). Likewise, only 283 question class samples from a total of 1100 samples were correctly classified (25.5%), while 617 neutral samples were correctly classified (55.6% of question class samples). Overall, all tested classifiers tended to bias toward the majority class.

We further performed an error analysis on what caused the error in the best model—BLSTM-CNN with F_W + F_P + F_S feature. Examples of prediction errors are shown in Fig. 14.

The sentence in Example 1 can be divided into two parts: (1) and (2) . The first part is a sarcastic remark that the Japanese consume too much of only Japanese-made products, with a negative sentiment of over-consumption from the Thai word “ ,” which is a shortened pronunciation of the English word “over.” The second part, is a remark with a positive sentiment that supports the idea that nationalism benefits domestic economy. The sentiment of this two-part sentence should be positive, but BLSTM-CNN predicted that it was negative because it focused on the first part and did not comprehend that that part was a sarcastic remark.

The sentence in Example 2 also can be divided into two parts: (1) and (2) global . The words in the first part clearly convey a negative sentiment (2) of getting annoyed (by), while the words in the second part state that Olympic game is a world-class athletic event that every country wants to host to benefit its own economy, which conveys a positive sentiment. The sentiment of the whole sentence should be negative because of the annoyance, but BLSTM-CNN focused on the second part and predicted that the sentiment of the whole sentence was positive.

Conclusion

This paper proposes a Thai sentiment analysis framework that includes data pre-processing steps, feature extraction, and model construction to perform the task. In addition, we propose a Thai-SenticNet5 corpus built on SenticNet5 in association with LEXiTRON, Volubilis, and Thai WordNet. Furthermore, three hybrid deep learning models—BLSTM-CNN, CNN-BLSTM, BLSTM+CNN, and BLSTM×CNN—are proposed and evaluated on three datasets: ThaiTales, ThaiEconTwitter, and Wisesight. Three different types of features—word embedding, POS tag, and sentic—were used to represent meaning, POS, and sentiment of a word, respectively. Apart from these three sets of features, we also evaluated all of their combinations. The results show that feature combination was able to improve the overall performance of sentiment analysis. The best candidate feature was a combination of word embedding, POS, and sentic features that led to the highest F₁-score. Moreover, the results demonstrate the enhancement of task performance aided by hybrid deep learning models. The experimental results show that BLSTM-CNN was the overall best contender.

As mentioned, our datasets were mostly imbalanced, but the current models did not consider the class imbalance. This problem cause the model to bias toward the majority class. In future work, we will consider applying focal loss that can handle the class imbalance problem as it was successfully evaluated on image data, e.g., red blood cell classification [86]. Also, a more recent version of SenticNet integrates symbolic models and subsymbolic methods to encode meaning and learn syntactic patterns from data [87]. It can be employed to improve the overall performance of sentiment analysis task.

Notes

https://wisesight.com/zocialeye
https://evolve24.com/
A group of English words that have synonymous meaning with Thai words.

References

Bright J, Margetts H, Hale S, Yasseri T. 2014. The use of social media for research and analysis: a feasibility study. In: DWP ad hoc research report, no. 13. London: Department for Work and Pensions.
Herhold K. 2014. Why digital marketing is an essential investment. Accessed 7 Jan 2019. https://www.business2community.com/digital-marketing/why-digital-marketing-is-an-essential-investment-02129487.
Cambria E, Grassi M, Hussain A, Havasi C. Sentic computing for social media marketing. Multimed Tools Appl 2012;59(2):557–577.
Article Google Scholar
Cambria E. Affective computing and sentiment analysis. IEEE Intell Sys 2016;31(2):102–107.
Article Google Scholar
Kim Y. Convolutional neural networks for sentence classification. Proceedings of the international conference on empirical methods in natural language processing, (EMNLP 2014), 25-29 October 2014, Doha, Qatar. p. 1746–1751; 2014.
Wang Y, Huang M, Zhu X, Zhao L. Attention-based LSTM for aspect-level sentiment classification. Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP 2016), 1-4 November 2016, Austin, Texas, USA. p. 606–615; 2016.
Plank B, Søgaard A, Goldberg Y . Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. Proceedings of the 54th annual meeting of the association for computational linguistics (ACL 2016), 7-12 August 2016, Berlin, Germany. p. 412–418; 2016.
Wang P, Li Z, Hou Y, Li W. 2016. Combining convnets with hand-crafted features for action recognition based on an HMM-SVM classifier. arXiv:160200749.
Chavaltada C, Pasupa K, Hardoon DR. Combining multiple features for product categorisation by multiple kernel learning. Proceedings of the 14th international conference on computing and information technology (IC2IT2018), 5-6 July 2018, Chiang Mai, Thailand. p. 3–12; 2018.
Pasupa K, Sunhem W, Loo CK. A hybrid approach to build face shape classifier for hairstyle recommender system. Expert Sys Appl 2019;120:14–32.
Article Google Scholar
Pasupa K, Seneewong Na Ayutthaya T. Thai sentiment analysis with deep learning techniques: a comparative study based on word embedding, POS-tag, and sentic features. Sustain Cities Soc 2019;50:101615.
Article Google Scholar
Seneewong Na Ayutthaya T, Pasupa K. Thai sentiment analysis via bidirectional LSTM-CNN model with embedding vectors and sentic features. Proceedings of the 13th International joint symposium on artificial intelligence and natural language processing (iSAI-NLP 2018), 15-17 November 2018, Pattaya, Thailand. p. 84–89; 2018.
Wang J, Yu LC, Lai KR, Zhang X. Dimensional sentiment analysis using a regional CNN-LSTM model. Proceedings of the 54th annual meeting of the association of computational linguistics (ACL 2016), 7-12 August 2016, Berlin, Germany. p. 225–230; 2016.
Lin S, Xie H, Yu LC, Lai KR. SentiNLP at IJCNLP-2017 task 4: Customer feedback analysis using a bi-LSTM-CNN model. Proceedings of the 8th international joint conference on natural language processing (IJCNLP 2017), 27 November-1 December 2017, Taipei, Taiwan. p. 149–154; 2017.
Minaee S, Azimi E, Abdolrashidi AA. 2019. Deep-sentiment: Sentiment analysis using ensemble of CNN and Bi-LSTM models. arXiv:190404206.
Lertsuksakda R, Netisopakul P, Pasupa K. Thai Sentiment terms construction using the hourglass of emotions. Proceedings of the 6th international conference on knowledge and smart technology (KST 2014), 30-31 January 2014, Chonburi, Thailand. p. 46–50; 2014.
Surin B. 2019. Volubilis 9.5 (2019.1)–107,000 Entries. Accessed 2 Jan 2019. http://belisan-volubilis.blogspot.com.
Bird S, Klein E, Loper E. 2009. Natural language processing with python. 1st ed. O’Reilly Media Inc.
Pasupa K, Netisopakul P, Lertsuksakda R. Sentiment analysis on thai children stories. Artif Life Robot 2016;21(3):357–364.
Article Google Scholar
Ofek N, Poria S, Rokach L, Cambria E, Hussain A, Shabtai A. Unsupervised commonsense knowledge enrichment for domain-specific sentiment analysis. Cognit Comput 2016;8(3):467–477.
Article Google Scholar
Oneto L, Bisio F, Cambria E, Anguita D. Semi-supervised learning for affective common-sense reasoning. Cognit Comput 2017;9(1):18–42.
Article Google Scholar
Wang J, Sun C, Li S, Wang J, Si L, Zhang M, et al. Human-like decision making: Document-level aspect sentiment classification via hierarchical reinforcement learning. Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP 2019), 3-7 November 2019, Hong Kong, China. Association for Computational Linguistics. p. 5580-5589. In: Inui K, Jiang J, Ng V, and Wan X, editors; 2019.
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R. Sentiment analysis of twitter data. Proceedings of the workshop on language in social media (LSM 2011), 23 June 2011, Portland, Oregon, USA. p. 30-38; 2011.
Phienthrakul T, Kijsirikul B, Takamura H, Okumura M. Sentiment classification with support vector machines and multiple kernel functions. Proceedings of the 16th international conference on neural information processing (ICONIP 2009), 1-5 December, 2009, Bangkok, Thailand. vol. 5864 of Lecture Notes in Computer Science. p. 583–592; 2009.
Flender M, Gips C. Sentiment analysis of a german twitter-corpus. Proceedings of the lernen, wissen, daten, analysen conference (LWDA 2017), 11-13 September 2017, Rostock, Germany. vol. 1917 of CEUR Workshop Proceedings. p. 25; 2017.
Abdaoui A. 2016. French social media mining: expertise and sentiment université Montpellier.
Peng H, Cambria E, Hussain A. A review of sentiment analysis research in chinese language. Cognit Computat 2017;9(4):423–435.
Article Google Scholar
Hussein DMEDM. A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 2018;30 (4):330–338.
Google Scholar
Vilares D, Peng H, Satapathy R, Cambria E. Babelsenticnet: A commonsense reasoning framework for multilingual sentiment analysis. Proceedings of the IEEE symposium series on computational intelligence (SSCI 2018), 18-21 November 2018, Bangalore, India. p. 1292–1298; 2018.
Sriphaew K, Takamura H, Okumura M. Sentiment analysis for thai natural language processing. Proceedings of the 2nd thailand-japan international academic conference (TJIA 2009), 20 November 2009, Kyoto, Japan. p. 123–124; 2009.
Boonkwan P. 2017. Personal communication.
Inrak P, Sinthupinyo S. Applying latent semantic analysis to classify emotions in Thai text. Proceedings of the 2nd international conference on computer engineering and technology (ICCET 2010), 16-18 April 2010, Chengdu, China. p. 450–454; 2010.
Haruechaiyasak C, Kongthon A, Palingoon P, Sangkeettrakarn C. Constructing Thai opinion mining resource: a case study on hotel reviews. Proceedings of the 8th workshop on asian language resources, 21-22 August 2010, Beijing, China. p. 64–71; 2010.
Haruechaiyasak C, Kongthon A, Palingoon P, Trakultaweekoon K. S-Sense: A sentiment analysis framework for social media sensing. Proceedings of the IJCNLP workshop on natural language processing for social media (SocialNLP 2013), 14-18 October, 2013, Nagoya, Japan. p. 6–13; 2013.
Damdoung W, Chanlekha H, Kawtrakul A. A context-induced bootstrapping approach for constructing contextual-dependent Thai sentiment lexicon. Proceedings of the 10th international symposium on natural language processing (SNLP 2013), 28 October-30 October 2013, Phuket, Thailand. p. 225–230; 2013.
Sarakit P, Theeramunkong T, Haruechaiyasak C, Okumura M. Classifying emotion in Thai youtube comments. Proceedings of the 6th international conference of information and communication technology for embedded systems (IC-ICTES 2015), 22-24 March 2015, Hua-hin, Thailand. p. 1–5; 2015.
Netisopakul P, Chattupan A. Thai stock news sentiment classification using wordpair features. Proceedings of the 29th pacific asia conference on language, information and computation, (PACLIC 2015), 30 October-1 November 2015, Shanghai, China. p. 188–195; 2015.
Vateekul P, Koomsubha T. A study of sentiment analysis using deep learning techniques on Thai Twitter data. Proceedings of the 13th international joint conference on computer science and software engineering (JCSSE 2016), 13-15 July 2016, Khon Kaen, Thailand. p. 1–6; 2016.
Cambria E, Havasi C, Hussain A. Senticnet 2: A semantic and affective resource for opinion mining and sentiment analysis. Proceedings of the 25 international florida artificial intelligence research society conference (FLAIRS 2012), 23-25 May 2012, Florida, USA. p. 202–207; 2012.
Netisopakul P, Pasupa K, Lertsuksakda R. Hypothesis testing based on observation from Thai sentiment classification. Artif Life Robot 2017;22(2):184–190.
Article Google Scholar
Zhang L, Wang S, Liu B. Deep learning for sentiment analysis: a survey. Data Min Knowl Discov 2018;8(4):e1253.
Google Scholar
Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cognit Comput 2017;9(6):843–851.
Article Google Scholar
Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2015), 9-13 August 2015, Santiago, Chile. p. 959–962; 2015.
Graves A, Fernández S, Schmidhuber J. Bidirectional LSTM networks for improved phoneme classification and recognition. Proceedings of the 15th international conference on artificial neural networks (ICANN 2005), 11-15 September 2005, Warsaw, Poland. p. 799-804; 2005.
Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 2005;18(5-6):602–610.
Article Google Scholar
Xu K, Xie L, Yao K. Investigating LSTM for punctuation prediction. Proceedings of the 10th international symposium on chinese spoken language processing (ISCSLP 2016), 17-20 October 2016, Tianjin, China p. 1–5; 2016.
Pang B, stars LEEL. Seeing exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd annual meeting on association for computational linguistics (ACL 2005), 25-30 June 2005, Michigan, USA. p. 115–124; 2005.
Ouyang X, Zhou P, Li CH, Liu L. Sentiment analysis using convolutional neural network. Proceedings of the IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing. pervasive intelligence and computing, (CIT/IUCC/DASC/PICom 2015), 26-28 October 2015, Liverpool, UK. p. 2359–2364; 2015.
Nowak J, Taspinar A, Scherer R. LSTM Recurrent neural networks for short text and sentiment classification. Proceedings of the international conference on artificial intelligence and soft computing (ICAISC 2017), 11-15 June 2017, Zakopane, Poland. p. 553–562; 2017.
Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, et al. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the international conference on empirical methods in natural language processing (EMNLP 2013), 18-21 October 2013, Washington, USA. p. 1631–1642; 2013.
Yu LC, Lee LH, Hao S, Wang J, He Y, Hu J, et al. Building Chinese affective resources in valence-arousal dimensions. Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies (NAACL HLT 2016), 12-17 June 2016, San Diego, California, USA. p. 540–545; 2016.
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (ACL HLT 2011), 19-24 June 2011, Oregon, USA. p. 142–150; 2011.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Proceedings of the advances in neural information processing systems (NIPS 2017), 4-9 December 2017, CA, USA. p. 5998–6008; 2017.
Ma D, Li S, Zhang X, Wang H. Interactive attention networks for aspect-level sentiment classification. Proceedings of the 26th international joint conference on artificial intelligence (IJCAI 2017), 19-25 August 2017, Melbourne, Australia. p. 4068–4074; 2017.
Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S. SemEval-2014 Task 4: aspect based sentiment analysis. Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), August 2014, Dublin, Ireland. p. 27–35; 2014.
PyThaiNLP. 2019. PyThaiNLP: Thai natural language processing in python. GitHub. Accessed 2 Jan 2019. https://github.com/PyThaiNLP/pythainlp.
Meknavin S, Charoenpornsawat P, Kijsirikul B. Feature-based Thai word segmentation. Proceedings of the natural language processing pacific rim symposium (NLPRS 1997), 2-4 December 1997, Phuket, Thailand. p. 1–6; 1997.
Aroonmanakun W. Collocation and Thai word segmentation. Proceedings of the 5th symposium on natural language processing (SNLP) & 5th Oriental COCOSDA Workshop, 9-11 May 2002 Huahin, Thailand. p. 68–75; 2002.
Haruechaiyasak C, Kongyoung S, Dailey M. A comparative study on Thai word segmentation approaches. Proceedings of the 5th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON 2008), 14-17 May 2008, Krabi, Thailand. p. 125–128; 2008.
Theeramunkong T, Sornlertlamvanich V, Tanhermhong T, Chinnan W. Character cluster based thai information retrieval. Proceedings of the 5th international workshop information retrieval with asian languages (IRAL 2002), 30 September-1 October 2000, Hong Kong, China. p. 75–80; 2000.
Norvig P. 2016. How to write a spelling corrector. Accessed 2 Jan 2019. https://norvig.com/spell-correct.html.
Sornlertlamvanich V, Takahashi N, Isahara H. 1998. Thai part-of-speech tagged corpus: ORCHID.
Nivre J, De Marneffe MC, Ginter F, Goldberg Y, Hajic J, Manning CD, et al. Universal dependencies v1: A multilingual treebank collection’. Proceedings of the 10th international conference on language resources and evaluation (LREC 2016). p. 1659–1666; 2016.
Eisner B, Rocktäschel T, Augenstein I, Bošnjak M, Riedel S. Emoji2vec: Learning Emoji representations from their description. Proceedings of the 4th international workshop on natural language processing for social media, Austin, TX, USA. p. 48–54; 2016.
Kim T, Wurster K. 2015. Emoji terminal output for Python. GitHub. Accessed 2 Jan 2019. https://github.com/carpedm20/emoji.
Cambria E, Speer R, Havasi C, Hussain A. Senticnet: A publicly available semantic resource for opinion mining. Proceedings of the 2010 AAAI fall symposium: commonsense knowledge, 11-13 November 2010, Arlington, Virginia, USA. vol. FS-10-02 of AAAI Technical Report. p. 14–18; 2010.
Wang J, Oard DW. 2006. Combining bidirectional translation and synonymy for cross-language information retrieval.
Thoongsup S, Robkop K, Mokarat C, Sinthurahat T, Charoenporn T, Sornlertlamvanich V, et al. Thai wordnet construction. Proceedings of the 7th workshop on asian language resources (ALR 2009), 6-7 August 2009, Singapore. p. 139–144; 2009.
Netisopakul P, Thong-iad K. Thai sentiment resource using Thai WordNet. Proceedings of the 12th international conference on complex, intelligent, and software intensive systems (CISIS 2018), 4-6 July 2018, Matsue, Japan. p. 329–340 ; 2018.
Cambria E, Poria S, Hazarika D, Kwok K. Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), 2-7 February 2018, Louisiana, USA. p. 1795–1802; 2018.
Mikolov T, Chen K, Corrado G, Dean J. 2013. Efficient estimation of word representations in vector space. arXiv:13013781.
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Proceedings of the international conference on empirical methods in natural language processing, (EMNLP 2014), 25-29 October 2014, Doha, Qatar. p. 1532-1543; 2014.
Howard J, Ruder S. 2018. Universal language model fine-tuning for text classification. arXiv:180106146.
Polpanumas C. 2019. ULMFit language modeling, text feature extraction, and text classification in Thai language. GitHub. Accessed 2 Jan 2019. https://github.com/cstorm125/thai2fit.
Merity S, Keskar NS, Socher R. 2017. Regularizing and optimizing LSTM language models. arXiv:170802182.
Cambria E, Livingstone A, Hussain A. 2012. The hourglass of emotions: Springer, Berlin.
Susanto Y, Livingstone A, Ng BC, Cambria E. The hourglass model revisited. IEEE Intelligent Systems 2020;35(5):96–102.
Article Google Scholar
Sundermeyer M, Schlüter R, Ney H. 2012. LSTM neural networks for language modeling.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–1958.
MathSciNet MATH Google Scholar
Agarap AF. 2018. Deep learning using rectified linear units (ReLU). arXiv:170802182.
Khunkwang P. 2017. A dictionary-based sentiment classification approach for business news on Twitter [B.Sc. Thesis]. Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, hailand.
Kingma DP, Ba J. 2014. Adam: A method for stochastic optimization. arXiv:14126980.
Siegel S, Castellian NJ. Nonparametric statistics for the behavioral sciences, 2nd ed. Singapore: McGraw-Hill; 1988.
Google Scholar
Siegel A.F. Multiple t tests: some practical considerations. TESOL Q 1990;24(4):773–775.
Article Google Scholar
Abdi A, Shamsuddin SM, Hasan S, Piran J. Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inform Process Manag 2019;56:1245–1259.
Article Google Scholar
Pasupa K, Vatathanavaro S, Tungjitnob S. 2020. Convolutional neural networks based focal loss for class imbalance problem: a case study of canine red blood cells morphology classification. Journal of Ambient Intelligence and Humanized Computing.
Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: Ensemble application of symbolic and subsymbolic AI for sentiment analysis. Proceedings of the 29th ACM conference on information and knowledge management (CIKM 2020), 19-23 October 2020, Virtual Event, Ireland. Association for Computing Machinery; in press. p. 1–9; 2020.

Download references

Funding

This research was supported by the Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang.

Author information

Authors and Affiliations

Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
Kitsuchart Pasupa & Thititorn Seneewong Na Ayutthaya

Authors

Kitsuchart Pasupa
View author publications
You can also search for this author in PubMed Google Scholar
Thititorn Seneewong Na Ayutthaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kitsuchart Pasupa.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: A Decade of Sentic Computing

Guest Editors: Erik Cambria, Amir Hussain

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pasupa, K., Seneewong Na Ayutthaya, T. Hybrid Deep Learning Models for Thai Sentiment Analysis. Cogn Comput 14, 167–193 (2022). https://doi.org/10.1007/s12559-020-09770-0

Download citation

Received: 28 February 2020
Accepted: 15 September 2020
Published: 04 March 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12559-020-09770-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hybrid Deep Learning Models for Thai Sentiment Analysis

Abstract

Similar content being viewed by others

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Text-Based Sentiment Analysis Using Deep Learning Techniques

Performance Analysis of Hybrid Architectures of Deep Learning for Indonesian Sentiment Analysis

Explore related subjects

Introduction

Related Works

Proposed Framework

Data Pre-Processing

Data Cleansing

Word Tokenization

POS Tagging

Sentic Tagging

Padding

Feature Extraction

Word Embedding

POS Tagging

Sentic

Learning Model

Bi-Directional Long Short-Term Memory

Convolutional Neural Network

Hybrid Deep Learning Models

BLSTM-CNN

CNN-BLSTM

BLSTM+CNN

BLSTM×CNN

Experimental Framework

Datasets

Wisesight Sentiment Dataset

Thailand Economy Twitter Dataset

The 40 Thai Children’s Tales Dataset

Experiment Settings

Performance Evaluation

Results and Discussions

Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Ethical Approval

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation