A novel framework for aspect based sentiment analysis using a hybrid BERT (HybBERT) model

Goud, Anushree; Garg, Bindu

doi:10.1007/s11042-023-17647-1

A novel framework for aspect based sentiment analysis using a hybrid BERT (HybBERT) model

1236: Explainable Artificial Intelligence Solutions for In-the-wild Human Behavior Analysis
Published: 21 November 2023

(2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A novel framework for aspect based sentiment analysis using a hybrid BERT (HybBERT) model

Download PDF

Anushree Goud¹ &
Bindu Garg¹

318 Accesses
1 Citation
Explore all metrics

Abstract

Sentiment analysis has turned out to be a pivotal technique for fetching insights from data in textual form, and the prominent method that has emerged is aspect-based sentiment analysis, i.e., the ABSA. ABSA follows a dissection of textural content in order to associate emotions with its distinct elements. This paper reveals the efficacy of the ABSA model while exploring the different methodologies for tackling the intricate scenarios of ABSA, majorly escalating its importance. Lying amid the spectrum of techniques, transformer-based models like BERT, RoBERTa, and DistillBERT have gained substantial traction in sentiment analysis, text extraction, and natural language processing (NLP). Numerous research endeavours have covered the most important of these transformer models to enhance ABSA performance. To successfully bridge this gap between theory and practice, we brought into consideration a hybrid BERT model, which was termed HyBERT. This model blends the strengths of BERT, RoBERTa, and DistilBERT. Using data from the comprehensive Hugging Face dataset, our study meticulously processes the shared information to identify traits related to ABSA. It represents an extensive evaluation of multiple models within the ABSA framework. Each model's performance has been scrutinised and benchmarked against other models. The assessment encompasses a spectrum of evaluation metrics, which include accuracy, precision, recall, and F1-score, that provide a holistic view of performance. Our research aims to provide an important revelation: it reflects the remarkable advancement in ABSA performance, and the outcome reveals the importance of a hybrid transformer model that takes the approach beyond the depths of sentiment analysis.

Aspect-based sentiment analysis: approaches, applications, challenges and trends

Article 14 August 2024

Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement

Article 18 April 2024

KnowMIS-ABSA: an overview and a reference model for applications of sentiment analysis and aspect-based sentiment analysis

Article Open access 09 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The creation of textual data has recently grown due to the expansion of social media platforms. The rising usage of online technology has changed how people communicate through user-generated content on e-commerce websites, social networks, blogs, etc. Following the tremendous popularity of these technologies, there has been a great interest among the researchers to explore data mining technologies for analyzing the subjective information. One such prominent research area is sentiment analysis (SA), which is used to understand user opinion and identify their view about a particular domain, website, or product. From both a commercial and academic point of view, SA is considered as an important task. Sentiment analysis is a technique that, in general, uses the polarity of the text to determine and classify the emotions and sentiment of a particular opinion or piece of user feedback. Sentiment analysis (SA) is carried out at three levels: document level, sentence level, and aspect level, to ascertain if the text from a document, a phrase, or an aspect expressed is positive, negative, or neutral. Regardless of the entities and other factors present, the majority of techniques now in use attempt to discern the polarity of the text or phrase. Contrarily, aspect-based sentiment analysis (ABSA), which is the focus of SA, aims to identify the aspects of various entities and examine the sentiment expressed for each aspect.

ABSA focuses on identifying the sentiments rather than the structure of the language. An aspect is basically related to an entity and the fundamental principle of an aspect is not restricted to any decision-making process, and can be extended towards the understanding of thoughts, user perspective, influence of social factors and ways of thinking. Hence, ABSA can be considered as an effective tool for understanding the sentiments of the users over a period of time across different domains. Because of its ability to achieve accurate sentiment classification considering different aspects, ABSA has gained huge significance in recent times.

The process involved in ABSA is categorized into three important processing stages such as; Sentiment evolution, Aspect Extraction (AE), and Aspect Sentiment Analysis (ASA) (SE), Several kinds of aspects, including explicit aspects, implicit aspects, aspect words, entities, and opinion target expressions, are retrieved in the first step. The next stage is related to the classification of sentiment polarities for a specific aspect or entity. This stage also involves the formation of interactions and analyzes the inter-relationship, contextual, and semantic relationships between multiple entities to increase sentiment categorization accuracy.The study of the user's emotions towards various components over time is related to the third stage. This comprises the user's social characteristics and experience, which are two key factors in sentiment analysis.

Compared to fundamental sentiment analysis, ABSA is a complex task since it incorporates both sentiments and aspects. The process of ABSA suffers from multifaceted challenges in terms of extracting OTE, aspects with neutral sentiments, and explicit aspects. In addition, it is also complicated to examine the relationship between various data objects for enhanced ABSA. The advancements in the field of natural language processing (NLP) with the aid of AI-based machine learning and deep learning have had a significant impact on the architecture of pre-trained models like ELMo, BERT, Roberta, and DistilBERT. These models were pre-trained using a lot of unlabelled text, and because of their versatility, ABSA performance has increased without the need for labelled data. The use of a hybrid BERT (HybBERT) model, which incorporates BERT, Roberta, and DistilBERT models for ABSA, is the main topic of this research.

The prominent contributions of this research can be summarized as follows:

This paper provides a comprehensive evaluation of the ABSA using the proposed HybBERT model.
The bi-directional transformer models used in this paper are pre-trained over large-scale unlabeled textural data to represent the language, which can be fine-tuned to perform specific learning-based tasks such as ABSA.
The paper presents a case study wherein two models are combined and the performance of the hybrid model is tested against a single model.

The paper is further structured as follows: Section 2 discusses the study of various existing research methodologies on ABSA. Section 3 discusses the proposed methodology for ABSA using the HybBERT model. Section 4 discusses the results of the experimental analysis with respect to different combinations of the ABSA models, and Section 5 concludes the paper by outlining the experimental observations.

2 Related Works

The concept of ABSA is not a straightforward approach and incorporates a lot of challenges. Several researchers have analyzed these challenges and have come up with different solutions that have proven their efficacies in identifying and classifying different sentiments (Mercha & Benbrahim, 2023) [1] (Chandra & Jana, 2020) [2] (Gadri et al., 2022) [3] (Liu et al., 2020) [4]. These techniques have provided pictorial representations and structured approaches to handle complex tasks such as emotion recognition and sentiment analysis. Since an aspect is represented using different words, it might require more than one classification algorithm and in such cases both ML and DL models have shown promising results (Do et al., 2019) [5]. Sentiment Analysis (SA) makes use of both syntactic and semantic information for classifying the polarities (Rezaeinia et al., 2019) [6]. Here, the context of each word in the sequence might be different and the context is realized based on the other words in the sequence. Different DL models such as Convolutional Neural Network (CNN) (Liao et al., 2017) [7], Recurrent Neural Network (RNN) (Usama et al., 2020) (M. Usama, B. Ahmad, E. Song, M. S. Hossain, M. Alrashoud, and G. Muhammad) [8, 45], Long Short Term Memory (LSTM) (Gandhi et al., 2021) [9] etc. attempt to understand the context/aspect of the word with long term dependency. By studying the long-term dependencies, attempts have been made to learn and recognize the features of the word or sentence. Based on this, it can be said that it is crucial for the model to learn the aspect of words by analyzing the aspect of complete sentences without depending on the length and bidirectionality of the aspects of adjacent words in parallel (Wei Song, Zijian Wen, Zhiyong Xiao, and Soon Cheol Park) [10]. A ML based sentiment analysis for predicting different sentiments from the social media data is proposed in (Wu et al., 2020) [11] (Broek-Altenburg & Adam J.Atherly) [32] (Li, Z., Fan, Y., Jiang, B) [41]. By examining the semantic associations between two words, a hybrid strategy integrating bi-directional long short-term memory (Bi-LSTM) and convolutional neural networks (CNN) is intended to detect various labels of emotion derived from psychiatric social texts (P. Bhuvaneshwari, A. Nagaraja Rao, Y. Harold Robinson & M. N. Thippeswamy) [38] (Hengyun Li, Bruce X.B. Yu, Gang Li, Huicai Gao) [43]. Results show that the hybrid Bi-LSTM—CNN model outperforms other models compared to other sentiment analysis models. However, the performance of the model can be improved by capturing more sentiment information. A supervised ML based sentiment analysis approach designed using Adaboost and multilayer perceptron (MLP) models is designed in (Aishwarya et al., 2019) [12]. By identifying the polarity of the available texts from the social media data, the sentiments were classified. Furthermore, the Adaboost-MLP model classified the data into multiple small classes and a novel approach of using uppercase letters and repetitive letters is used to balance the sentiment weight factors. However, a larger dataset is used to train the model and the performance is tested using small samples.For sentiment analysis and emotion recognition, a DL-based multi-task learning architecture is used (Akhtar et al., 2019) [13]. A bi-directional Gated Recurrent Unit (biGRU) model is employed for extracting the contextual data. A context level inter-modal attention technique is implemented for accurately predicting the sentiments using CMU-MOSEI dataset. Although the model achieves better performance in a multi-task learning environment, there is a need to explore other aspects such as emotion classification and intercity prediction in the proposed multitask framework. In addition, it is also important to analyze the sentiment considering the entity or aspect of the sentence. With its ability to overcome the drawback of long term dependency, LSTM models have gained huge significance and the same has been successfully implemented for ABSA (Bao et al., 2019) [14] (Xing et al., 2019) [15] (Xu et al., 2020) (B. Xing, L. Liao, D. Song, J. Wang, F. Zhang, Z. Wang, and H. Huang, 2019) [16, 46]. The effectiveness of LSTM models has been experimentally validated in several works (Al-Smadi et al., 2019) [17] employed the LSTM model for ABSA of Arabic reviews. ABSA performance increased by 39%. (Alexandridis et al., 2021) [18] implemented a knowledge based DL architecture for ABSA by integrating a hybrid Bi-LSTM with convolutional layers and an attention mechanism for enhancing the textual features. The LSTM is combined with fuzzy logic for ABSA in (Sivakumar & Uyyala, 2021) [19] for reviewing mobile phones. A highest accuracy of 96.93% was obtained when compared for other techniques. The review of DL approaches for ABSA presented in (Do et al., 2019) [5] states that the research related to ABSA and deep learning is still in the infant stage and there is a great scope to investigate further. Considering the relationship between an opinion and the aspect, the performance of the ABSA models can be improved by simultaneously extracting and classifying the sentiment, and aspect. Most of the works have focused only on aspect categorization and the models that are used to perform aspect recognition and sentiment analysis have not achieved desired performance. Hence, it is suggested to design a combined approach which can perform both tasks and provide a more precise sentiment analysis at aspect level. In this context, the implementation of pre-trained language models can be investigated for ABSA (Zhang et al., 2022) [20] (Shim et al., 2021) [21] (Troya et al., 2021) [22].

Fundamentally, the learning process of pre-trained language models is different from conventional pre-trained embedding models in terms of word embedding from a given aspect. The pre-trained language models represent the features from unsupervised text and are fine-tuned to overcome the problem of labeled data deficiency for training. Recently, transformer based methods such as BERT (Devlin et al., 2018) [23], Roberta (Wang et al., 2019) [24], DistilBERT (Sanh et al., 2019) [25], XLNet (Yang et al., 2019) [26] etc. are the most prominent bidirectionally pre-trained language models (Mao et al., 2020) [27]. The BERT model consists of several attention layers and is trained to perform downstream tasks such as emotion recognition and sentiment analysis. The BERT model can be trained to obtain deep bidirectional representations from unlabeled text by concatenating all the layers in the model (Sun et al., 2019) [28]. Hence, a pre-trained BERT model can be fine-tuned by adding one extra additional layer for performing a wide range of tasks without requiring any major architectural modifications 9Li et al., 2019) [29]. The performance of the BERT model with other pre-trained transformer models such as RoBERTa, DistilBERT, and XLNet is presented in (Adoma et al., 2020) [30] for identifying emotions from the texts. This research analyzes the sentiment by considering the output of each model and is compared with other models. To classify various emotions, including anger, contempt, sorrow, fear, joy, humiliation, and guilt, all three models are refined using the ISEAR data. Experimental analysis shows that the RoBERTa, XLNet, BERT, and DistilBERT models achieve an accuracy of 74.31%, 72.99%, 70.09%, and 66.93% respectively. These models exhibit better accuracy as an individual model. (Phan et al., 2020) [31] extracted the syntactical features for ABSA using BERT, and RoBERTa. Results show that the proposed approach achieves phenomenal results. Although these models have shown excellent results as an individual model, there is a need for more exploration in this aspect. In future, an ensemble model can be implemented which combines all three models or any two models for improving the accuracy of emotion recognition and sentiment analysis. Considering the excellent attributes of the transformer based models, this research emphasizes the application of Bert, Roberta, and DistilBERT models for ABSA.

The aspect based sentiment analysis is a mixed approach in sentiment analysis which encapsulates multiple challenges. Researchers have successfully handles these challenges and laid out effective solutions as represented by their contributions in the classification of these sentiments (Mercha & Benbrahim, 2023) [1], (Chandra & Jana, 2020) [2], (Gadri et al., 2022) [3], and (Liu et al., 2020) [4]. These solutions utilize the visual aids and structured methodologies for managing all the intricate tasks such as emotion recognition and sentiment analysis.

Research	Contributions
Mercha & Benbrahim	Identified challenges and proposed solutions [1]
Chandra & Jana	Addressed challenges and provided insights [2]
Gadri et al	Proposed effective strategies for sentiment analysis [3]
Liu et al	Introduced innovative approaches for sentiment analysis [4]

LSTM models are know for representing long-term dependency issues which are successfully applied in ABSA (Bao et al., 2019) [14], (Xing et al., 2019) [15], and (Xu et al., 2020) [16]. These model representations have clearly demonstrated substantial performance improvements, for example, 39% enhancement is ABSA performance for Arabic reviews (Al-Smadi et al., 2019) [17].

Research	LSTM Application
Bao et al	Successful LSTM-based ABSA implementation [14]
Xing et al	Improved ABSA using LSTM models [15]
Xu et al	Application of LSTM for effective ABSA [16]

The pre trained models like BERTm Roberta, and DistilBERT have gained prominence in ABSA research (Zhang et al., 2022) [20], (Shim et al., 2021) [21], and (Troya et al., 2021) [22]. These two directional models offer the most unique advantages in the process of fine tuning with an aim to achieve exceptional results in different sentiment related tasks.

Transformer Model	Key Features & Advantages
BERT	Bidirectional contextual representations [23]
RoBERTa	Optimized pre-training for enhanced performance [24]
DistilBERT	Compact version of BERT with efficient training [25]

ABSA have multiple challenges that researchers have tried to addressed through different innovative solutions. DL models, LTSM approaches, and transformer models are held at great positions to advance sentiment analysis, especially within the complex contexts like ABSA.

The concept of ABSA is not a straightforward approach and incorporates a lot of challenges. Several researchers have analyzed these challenges and have come up with different solutions that have proven their efficacies in identifying and classifying different sentiments. Here, the context of each word in the sequence might be different and the context is realized based on the other words in the sequence. By studying the long-term dependencies, attempts have been made to learn and recognize the features of the word or sentence. Based on this, it can be said that it is crucial for the model to learn the aspect of words by analyzing the aspect of complete sentences without depending on the length and bidirectionality of the aspects of adjacent words in parallel. A. Nagaraja Rao, Y. Harold Robinson & M. N. Thippeswamy) [38] (Hengyun Li, Bruce X.B. Yu, Gang Li, Huicai Gao) [43]. Results show that the hybrid Bi-LSTM—CNN model outperforms other models compared to other sentiment analysis models. However, the performance of the model can be improved by capturing more sentiment information.

A bi-directional Gated Recurrent Unit (biGRU) model is employed for extracting the contextual data. A context-level inter-modal attention technique is implemented for accurately predicting sentiments using the CMU-MOSEI dataset. Although the model achieves better performance in a multi-task learning environment, there is a need to explore other aspects, such as emotion classification and intercity prediction, in the proposed multitask framework. In addition, it is also important to analyze the sentiment by considering the entity or aspect of the sentence.

Fundamentally, the learning process of pre-trained language models is different from conventional pre-trained embedding models in terms of word embedding in a given aspect. The pre-trained language models represent the features from unsupervised text and are fine-tuned to overcome the problem of labelled data deficiency for training.

The BERT model consists of several attention layers and is trained to perform downstream tasks such as emotion recognition and sentiment analysis. The BERT model can be trained to obtain deep bidirectional representations from unlabeled text by concatenating all the layers in the model (Sun et al., 2019) [28].

Although these models have shown excellent results as individual models, there is a need for more exploration in this aspect. In the future, an ensemble model can be implemented that combines all three models or any two models to improve the accuracy of emotion recognition and sentiment analysis. Considering the excellent attributes of the transformer-based models, this research emphasizes the application of the Bert, Roberta, and DistilBERT models for ABSA.

3 Proposed Methodology: Aspect-Based Sentiment Analysis (ABSA) using HybBERT model

In order to understand the user's emotion, this study analyses the sentiment from text data while taking into account several factors. For sentiment analysis, this study uses a hybrid BERT model known as the hybBERT model. The proposed hybrid model combines 3 bi-directional transformer based models such as Bert, Roberta, and DistilBERT for simultaneously performing aspect recognition and sentiment analysis. These models are pre-trained over large-scale unlabeled textural data to represent the language and are fine-tuned to perform ABSA. The ABSA is analyzed through different combinations of models wherein two models are combined and the performance is compared with a single model. The work flow of the proposed approach is illustrated in Fig. 1.

4 Materials and Methods

From an academic and professional perspective, sentiment analysis is becoming more and more recognised as a crucial task. However, the bulk of current systems aim to identify the overall polarity of a sentence, paragraph, or text span, regardless of the entities (such as laptops and restaurants) and their components (such as the battery, screen, food, and service). Aspect-based sentiment analysis (ABSA), in contrast, is the focus of this task. Its objective is to detect the aspects of the target entities and the sentiment expressed towards each aspect. Datasets of customer reviews with human-authored annotations indicating the specified properties of the target entities and the polarity of each property's sentiment will be made available.

The task specifically consists of the following subtasks:

Subtask 1: Extracting aspect terms

Determine the aspect terms that are present in a set of sentences with pre-identified entities (such as restaurants) and provide a list of all the different aspect terms. A specific aspect of the target entity is identified by an aspect word.

For instance, "The food was nothing special, but I loved the staff," or "I liked the service and the staff, but not the food. The phrase "The hard disc is very noisy" is the only instance of an aspect term that consists of more than one word (for example, "hard disc").
Subtask 2: Polarity of the aspect word

Determine the polarity of each aspect term in a statement, such as whether it is positive, negative, neutral, or in conflict (both positive and negative).

Here is the dataset link which will be used in the code:

https://huggingface.co/datasets/Yaxin/SemEval2014Task4Raw/viewer/All/train

4.1 Data Acquisition

The first part of the process involves gathering input data from the database, which is known as data collection or acquisition. In this research, the data is collected from the Hugging Face dataset (Geetha & Renuka, 2021) [42]. The dataset features define the internal structure of the dataset and contain high-level information regarding different fields such as the number of columns, types, data labels, and conversion methods. The dataset consists of data labels such as text, aspect terms, aspect categories, domain and sentence Id which helps to predefine the set of classes and are stored in the form of integers.

4.2 Data Preprocessing

The data extracted from the dataset is in basic raw format and needs to be processed before feeding it to the model for training. By removing uncertainties like missing data, null values, noisy data, and other irregularities, the raw data is transformed into a clean dataset at this step. Preprocessing is the basic step in the ABSA process and involves different steps such as:

Sorting the dataset according to the column
Shuffling the dataset
Filtering the rows according to the index
Concatenating the datasets with the same column types

The obtained data consisted of multiple columns containing responses, opinions, and feedback. The columns with sentiment labels and individual response are considered in this research for ABSA. Hence, these two columns are subjected to preprocessing. It was observed that few columns consisted of sentiment labels without any textual responses. In addition, the data also consisted of special characters, irrelevant tags, double spacing and redundant expressions, which are also filtered since they can have a negative impact on the performance of sentiment analysis. The dataset, with a total of 4631 rows, is taken into consideration for the analysis, with a total of 2 columns after filtering. The data are divided into training, testing, and validation samples, with 3509 samples being used for training, 1028 samples being used for testing, and 94 samples being used for validation.

4.3 Aspect-based feature extraction

In general, feature extraction is performed extracting relevant features for the sentiment analysis and classification process (Meng et al., 2019) [33]. When the majority of the characteristics are not enough to contribute to the total variance, the dimensionality is reduced by extracting just the relevant features. The computing time will be cut down, and the overall performance will be improved, by removing unnecessary and duplicate features. Aspect-based feature extraction techniques extract the essential features that are crucial for defining the text's primary traits, or aspects. This helps in finding important textual elements that reflect an opinion or a mood. In order to choose pertinent and appropriate features from the text, new feature subsets are acquired at this stage. For aspect-based feature extraction, two bags of words are created, where the first group contains aspects and the second group contains the tendency of sentiment polarity.

In aspect based feature extraction, it is highly challenging to extract explicit aspects, aspect categories, and Opinion Term Extraction (OTE) based on which the sentiment is expressed (Zhang et al., 2022) [34]. An explicit aspect here means that the sentiment is present in the text, for instance: “the food in this restaurant is great” is an example of an explicit aspect where food is an explicit aspect and restaurant is the explicit entity. While performing aspect extraction, the relation between different aspects is also analyzed for identifying the coherent, consistent, and aspects with high similarity from the dataset in order to improve the overall representation of the extracted aspects.

4.4 Aspect Classification using HybBERT model

This study implements a HybBERT model for aspect classification and ABSA. During the training phase, the models will use examples from the data set to determine how to transform the input text into a class of pertinent attributes (Lu Xu, Lidong Bing, Wei Lu, Fei Huang, November 2020) [25]. The feature vectors and appropriate class labels will be given for the proposed HybBERT model. The classification models used for ABSA are explained as follows:

4.4.1 BERT Model

BERT is a bi-directional transformer model which is used to pre-train a large volume of unlabeled textural data to learn how to represent a language and to fine tune the model for performing a specific task. BERT model outperforms when applied for natural language processing (NLP) and sentiment analysis tasks (Howard & Ruder, 2018) [35] (Radford et al., 2018) [36]. The improved performance of the BERT model is due to the bidirectional transformer and its ability to pre-train the model for predicting the next sentence. BERT employs a fine tuning mechanism which almost eliminates the need for a specific architecture for performing each task. Hence, the BERT model is considered as an intelligence model which can reduce the necessity of prior knowledge before designing the model and instead it enables the model to learn from the available data. The BERT model has two architectural settings which are the BERTBASE and BERTLARGE. The BERTBASE has 12 layers with 768 hidden dimensions and 12 attention heads (in transformer) with 110 M number of parameters. The BERTLARGE, on the other hand, has 24 layers, 1024 hidden dimensions, and 16 attention heads (in a transformer) with 340 M parameters.

In this research the BERT model is fine-tuned to perform two tasks namely aspect extraction (AE) and ABSA. For aspect extraction, the BERT model is provided with continuous samples of data which are labeled as A and B as aspects. The input text ‘n’ words are constructed as x = (× 1, …., xm). After h = BERT (x), a dense layer along with a Softmax layer is applied for each position of the text denoted as l = softmax (W * h + b) where, W denotes the text in the word, l is the length of the sequence, h and b are the dimensions. Softmax is applied along with the dimension of labels for each text and the labels are predicted based on the position of the text.

For aspect classification, the BERT model is fine tuned to classify the polarity of the sentiment (Positive, negative, or neutral) based on the aspect extracted from the text. For ABSA, the model is provided with two inputs namely an aspect and a text defining the aspect. Let x = (p1,.,pm) be an aspect with ‘n’ number of tokens, and x = (t1,.,tm) defining the text containing the aspect.

Similar to aspect extraction, after h = BERT (x), the learning representation of the BERT model is leveraged to obtain the polarity of the text, based on which the polarity of the text is calculated.

The fine tuning of the BERT model is simple and straightforward since the learning capability of the transformer supports the model in various down streaming tasks whether in terms of classifying the single text or multiple texts by interchanging the respective inputs and outputs (Azhar & Khodra, 2020) [37]. For classifying the sentiments in multiple text pairs, the BERT model derives a common pattern which can automatically encode the text pairs and performs bidirectional cross attention between two sentences.

4.4.2 ROBERTA Model

An improved version of the BERT model, the ROBERTA model was first presented by Facebook. The ROBERTA model is obtained as a retrained BERT model with more computing capabilities and better training process. In order to achieve enhanced training, ROBERTA eliminates the Next Sentence Prediction (NSP) task from the existing BERT model and employs a dynamic masking approach such that the masked token can be changed while training the model for several epochs. This factor also enables the ROBERTA model to train using large training samples. The text obtained from the dataset is given as input to the model, and the outputs are obtained from the last layer of the model. While using the ROBERTA model for ABSA, the model is fine-tuned using a cross-entropy loss function, which enables the model to effectively leverage its potential in terms of ABSA. Another important advantage of the ROBERTA model is utilization of dynamic masking, wherein the model is provided with a dynamic set of data samples from the text instead of providing only a fixed set of data samples as in the BERT model. This improves the learning ability of the ROBERTA model by enabling it to learn from a diversified set of data. In addition, the dynamic masking also makes the model more resilient to the changes in the input data, which is more important while handling diverse, irregular and inconsistent text. In addition, the ROBERTA model also employs a No Mask Left Behind (NMLB) technique which makes sure that all the data samples are masked at least once during training unlike BERT models which uses a Masked Language Modeling (MLM) technique that masks only 15% of the data samples. This improves the representation ability of the text and helps the model to exhibit better ABSA performance. For ABSA, the ROBERTA model consists of four modules namely, (i) a word embedding layer, (ii) a semantic representation layer, (iii) cross attention layer, and (iv) a classification output layer.

In the word embedding layer, the ROBERTA model is pre-trained using multiple embedded layers of sentences and aspects to obtain an improved representation of aspects. The transformer used in the ROBERTA model helps in capturing the bidirectional relationship between the sentences and helps in mapping each word wi in the sentence and its aspect to a low dimensional vector vi ∈ Rdw, where dw is the dimension on the word vector. The pre-training of the model results in the sentences and aspects that caters the sentence vector {v1s, v2s, v3s,. vns} ∈ Rn *dw and the corresponding aspect vector {v1s, v2s, v3s,…..vns} ∈ Rm *dw. Both the sentence and the aspect vectors obtained from the word embedding layer are embedded with the word vector and are given as input to the semantic representation layer. In the cross attention layer, the word vectors obtained from the semantic representation layer are used to determine the effect of weight of the aspect and sentence. Initially, the cross attention layer takes the aspect and sentence denoted as ha ∈ Rn *dw and hs ∈ Rn *dw respectively and later the matching matrix I = hs * ha is calculated. Further, a Softmax function is applied to the matrix I to obtain a degree α and β of the sentence to the aspect. Lastly, the average of the attention is calculated using the attention degree β of the sentence to the aspect is calculated. The most important aspect is represented using term β* which along with degree α is used to calculate the final output denoted as γ ∈ Rm, as shown in below equations:

$$\mathrm{\alpha ij}=\mathrm{exp }(\mathrm{Iij}) /\mathrm{ \Sigma i }(\mathrm{Iij})$$

(1)

$$\mathrm{\beta ij}=\mathrm{exp }(\mathrm{Iij}) /\mathrm{ \Sigma i }(\mathrm{Iij})$$

(2)

Correspondingly, the important term of aspect β* is represented as follows:

$${\upbeta }^{*}=(1/\mathrm{n})\Sigma (\upbeta )$$

(3)

Finally, the impact is denoted as

$$\upgamma ={\mathrm{\alpha }}^{*} {\upbeta }^{*}$$

(4)

The classification output layer generates the output of the model by convoluting the results of the semantic representation layer and uses it to predict the final polarity of the sentiment corresponding to the aspect, as shown in below equations:

$$\mathrm{r}=\mathrm{h }{\mathrm{w}}^{*}\upgamma$$

(5)

$$\mathrm p=\mathrm{softmax}\;(\mathrm w^\ast\mathrm r+\mathrm b)$$

(6)

The term ‘p’ defines the probability distribution of sentiment analysis, w and b define the weight and bias of the matrices respectively. The model is trained using a cross-entropy loss function and L1 regularization, as shown in Eq. 7.

$$\mathrm{L}1=-\mathrm{\Sigma i \Sigma c}\in \mathrm{C }[((\mathrm{yi}=\mathrm{c})*\mathrm{log }(\mathrm{p }(\mathrm{yi}=\mathrm{c}))+\uplambda ||\uptheta ||2)]$$

(7)

wherein p (yi = c) is the predicted sentiment polarity and (yi = c) is the actual sentiment polarity, and λ is the regularization parameter.

4.4.3 DISTILBERT Model

The DISTILBERT model is a distilled version of the BERT model that exhibits almost the same performance as the BERT model but uses only half the number of parameters. In particular, the DISTILBERT model does not incorporate any word-based embeddings and makes use of only half of the layers from the BERT model. The main difference between the BERT and DISTILBERT models are illustrated in Table 1.

Table 1 Difference between BERT and DISTILBERT models

A novel framework for aspect based sentiment analysis using a hybrid BERT (HybBERT) model

Abstract

Similar content being viewed by others

Aspect-based sentiment analysis: approaches, applications, challenges and trends

Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement

KnowMIS-ABSA: an overview and a reference model for applications of sentiment analysis and aspect-based sentiment analysis

Explore related subjects

1 Introduction

2 Related Works

3 Proposed Methodology: Aspect-Based Sentiment Analysis (ABSA) using HybBERT model

4 Materials and Methods

4.1 Data Acquisition

4.2 Data Preprocessing

4.3 Aspect-based feature extraction

4.4 Aspect Classification using HybBERT model

4.4.1 BERT Model

4.4.2 ROBERTA Model

4.4.3 DISTILBERT Model

5 Results Analysis and Discussion

Aspect Identification and Extraction

Sentiment Polarity Detection

5.1 F1 Score

5.2 Performance of BERT, ROBERTA, and DISTILBERT models

5.3 Performance of different combination of transformer models

5.4 Comparative Analysis

5.5 Discussion of Findings & Contribution

5.6 Analysis of Transformer Model Performance

6 Validation Of HyBERT Model

6.1 Significance of Future Implications

6.2 Real-time case study

6.3 Predominance of the Study Over Existing Studies of the Domain

7 Conclusion

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation