1 Introduction

The creation of textual data has recently grown due to the expansion of social media platforms. The rising usage of online technology has changed how people communicate through user-generated content on e-commerce websites, social networks, blogs, etc. Following the tremendous popularity of these technologies, there has been a great interest among the researchers to explore data mining technologies for analyzing the subjective information. One such prominent research area is sentiment analysis (SA), which is used to understand user opinion and identify their view about a particular domain, website, or product. From both a commercial and academic point of view, SA is considered as an important task. Sentiment analysis is a technique that, in general, uses the polarity of the text to determine and classify the emotions and sentiment of a particular opinion or piece of user feedback. Sentiment analysis (SA) is carried out at three levels: document level, sentence level, and aspect level, to ascertain if the text from a document, a phrase, or an aspect expressed is positive, negative, or neutral. Regardless of the entities and other factors present, the majority of techniques now in use attempt to discern the polarity of the text or phrase. Contrarily, aspect-based sentiment analysis (ABSA), which is the focus of SA, aims to identify the aspects of various entities and examine the sentiment expressed for each aspect.

ABSA focuses on identifying the sentiments rather than the structure of the language. An aspect is basically related to an entity and the fundamental principle of an aspect is not restricted to any decision-making process, and can be extended towards the understanding of thoughts, user perspective, influence of social factors and ways of thinking. Hence, ABSA can be considered as an effective tool for understanding the sentiments of the users over a period of time across different domains. Because of its ability to achieve accurate sentiment classification considering different aspects, ABSA has gained huge significance in recent times.

The process involved in ABSA is categorized into three important processing stages such as; Sentiment evolution, Aspect Extraction (AE), and Aspect Sentiment Analysis (ASA) (SE), Several kinds of aspects, including explicit aspects, implicit aspects, aspect words, entities, and opinion target expressions, are retrieved in the first step. The next stage is related to the classification of sentiment polarities for a specific aspect or entity. This stage also involves the formation of interactions and analyzes the inter-relationship, contextual, and semantic relationships between multiple entities to increase sentiment categorization accuracy.The study of the user's emotions towards various components over time is related to the third stage. This comprises the user's social characteristics and experience, which are two key factors in sentiment analysis.

Compared to fundamental sentiment analysis, ABSA is a complex task since it incorporates both sentiments and aspects. The process of ABSA suffers from multifaceted challenges in terms of extracting OTE, aspects with neutral sentiments, and explicit aspects. In addition, it is also complicated to examine the relationship between various data objects for enhanced ABSA. The advancements in the field of natural language processing (NLP) with the aid of AI-based machine learning and deep learning have had a significant impact on the architecture of pre-trained models like ELMo, BERT, Roberta, and DistilBERT. These models were pre-trained using a lot of unlabelled text, and because of their versatility, ABSA performance has increased without the need for labelled data. The use of a hybrid BERT (HybBERT) model, which incorporates BERT, Roberta, and DistilBERT models for ABSA, is the main topic of this research.

The prominent contributions of this research can be summarized as follows:

  • This paper provides a comprehensive evaluation of the ABSA using the proposed HybBERT model.

  • The bi-directional transformer models used in this paper are pre-trained over large-scale unlabeled textural data to represent the language, which can be fine-tuned to perform specific learning-based tasks such as ABSA.

  • The paper presents a case study wherein two models are combined and the performance of the hybrid model is tested against a single model.

The paper is further structured as follows: Section 2 discusses the study of various existing research methodologies on ABSA. Section 3 discusses the proposed methodology for ABSA using the HybBERT model. Section 4 discusses the results of the experimental analysis with respect to different combinations of the ABSA models, and Section 5 concludes the paper by outlining the experimental observations.

2 Related Works

The concept of ABSA is not a straightforward approach and incorporates a lot of challenges. Several researchers have analyzed these challenges and have come up with different solutions that have proven their efficacies in identifying and classifying different sentiments (Mercha & Benbrahim, 2023) [1] (Chandra & Jana, 2020) [2] (Gadri et al., 2022) [3] (Liu et al., 2020) [4]. These techniques have provided pictorial representations and structured approaches to handle complex tasks such as emotion recognition and sentiment analysis. Since an aspect is represented using different words, it might require more than one classification algorithm and in such cases both ML and DL models have shown promising results (Do et al., 2019) [5]. Sentiment Analysis (SA) makes use of both syntactic and semantic information for classifying the polarities (Rezaeinia et al., 2019) [6]. Here, the context of each word in the sequence might be different and the context is realized based on the other words in the sequence. Different DL models such as Convolutional Neural Network (CNN) (Liao et al., 2017) [7], Recurrent Neural Network (RNN) (Usama et al., 2020) (M. Usama, B. Ahmad, E. Song, M. S. Hossain, M. Alrashoud, and G. Muhammad) [8, 45], Long Short Term Memory (LSTM) (Gandhi et al., 2021) [9] etc. attempt to understand the context/aspect of the word with long term dependency. By studying the long-term dependencies, attempts have been made to learn and recognize the features of the word or sentence. Based on this, it can be said that it is crucial for the model to learn the aspect of words by analyzing the aspect of complete sentences without depending on the length and bidirectionality of the aspects of adjacent words in parallel (Wei Song, Zijian Wen, Zhiyong Xiao, and Soon Cheol Park) [10]. A ML based sentiment analysis for predicting different sentiments from the social media data is proposed in (Wu et al., 2020) [11] (Broek-Altenburg & Adam J.Atherly) [32] (Li, Z., Fan, Y., Jiang, B) [41]. By examining the semantic associations between two words, a hybrid strategy integrating bi-directional long short-term memory (Bi-LSTM) and convolutional neural networks (CNN) is intended to detect various labels of emotion derived from psychiatric social texts (P. Bhuvaneshwari, A. Nagaraja Rao, Y. Harold Robinson & M. N. Thippeswamy) [38] (Hengyun Li, Bruce X.B. Yu, Gang Li, Huicai Gao) [43]. Results show that the hybrid Bi-LSTM—CNN model outperforms other models compared to other sentiment analysis models. However, the performance of the model can be improved by capturing more sentiment information. A supervised ML based sentiment analysis approach designed using Adaboost and multilayer perceptron (MLP) models is designed in (Aishwarya et al., 2019) [12]. By identifying the polarity of the available texts from the social media data, the sentiments were classified. Furthermore, the Adaboost-MLP model classified the data into multiple small classes and a novel approach of using uppercase letters and repetitive letters is used to balance the sentiment weight factors. However, a larger dataset is used to train the model and the performance is tested using small samples.For sentiment analysis and emotion recognition, a DL-based multi-task learning architecture is used (Akhtar et al., 2019) [13]. A bi-directional Gated Recurrent Unit (biGRU) model is employed for extracting the contextual data. A context level inter-modal attention technique is implemented for accurately predicting the sentiments using CMU-MOSEI dataset. Although the model achieves better performance in a multi-task learning environment, there is a need to explore other aspects such as emotion classification and intercity prediction in the proposed multitask framework. In addition, it is also important to analyze the sentiment considering the entity or aspect of the sentence. With its ability to overcome the drawback of long term dependency, LSTM models have gained huge significance and the same has been successfully implemented for ABSA (Bao et al., 2019) [14] (Xing et al., 2019) [15] (Xu et al., 2020) (B. Xing, L. Liao, D. Song, J. Wang, F. Zhang, Z. Wang, and H. Huang, 2019) [16, 46]. The effectiveness of LSTM models has been experimentally validated in several works (Al-Smadi et al., 2019) [17] employed the LSTM model for ABSA of Arabic reviews. ABSA performance increased by 39%. (Alexandridis et al., 2021) [18] implemented a knowledge based DL architecture for ABSA by integrating a hybrid Bi-LSTM with convolutional layers and an attention mechanism for enhancing the textual features. The LSTM is combined with fuzzy logic for ABSA in (Sivakumar & Uyyala, 2021) [19] for reviewing mobile phones. A highest accuracy of 96.93% was obtained when compared for other techniques. The review of DL approaches for ABSA presented in (Do et al., 2019) [5] states that the research related to ABSA and deep learning is still in the infant stage and there is a great scope to investigate further. Considering the relationship between an opinion and the aspect, the performance of the ABSA models can be improved by simultaneously extracting and classifying the sentiment, and aspect. Most of the works have focused only on aspect categorization and the models that are used to perform aspect recognition and sentiment analysis have not achieved desired performance. Hence, it is suggested to design a combined approach which can perform both tasks and provide a more precise sentiment analysis at aspect level. In this context, the implementation of pre-trained language models can be investigated for ABSA (Zhang et al., 2022) [20] (Shim et al., 2021) [21] (Troya et al., 2021) [22].

Fundamentally, the learning process of pre-trained language models is different from conventional pre-trained embedding models in terms of word embedding from a given aspect. The pre-trained language models represent the features from unsupervised text and are fine-tuned to overcome the problem of labeled data deficiency for training. Recently, transformer based methods such as BERT (Devlin et al., 2018) [23], Roberta (Wang et al., 2019) [24], DistilBERT (Sanh et al., 2019) [25], XLNet (Yang et al., 2019) [26] etc. are the most prominent bidirectionally pre-trained language models (Mao et al., 2020) [27]. The BERT model consists of several attention layers and is trained to perform downstream tasks such as emotion recognition and sentiment analysis. The BERT model can be trained to obtain deep bidirectional representations from unlabeled text by concatenating all the layers in the model (Sun et al., 2019) [28]. Hence, a pre-trained BERT model can be fine-tuned by adding one extra additional layer for performing a wide range of tasks without requiring any major architectural modifications 9Li et al., 2019) [29]. The performance of the BERT model with other pre-trained transformer models such as RoBERTa, DistilBERT, and XLNet is presented in (Adoma et al., 2020) [30] for identifying emotions from the texts. This research analyzes the sentiment by considering the output of each model and is compared with other models. To classify various emotions, including anger, contempt, sorrow, fear, joy, humiliation, and guilt, all three models are refined using the ISEAR data. Experimental analysis shows that the RoBERTa, XLNet, BERT, and DistilBERT models achieve an accuracy of 74.31%, 72.99%, 70.09%, and 66.93% respectively. These models exhibit better accuracy as an individual model. (Phan et al., 2020) [31] extracted the syntactical features for ABSA using BERT, and RoBERTa. Results show that the proposed approach achieves phenomenal results. Although these models have shown excellent results as an individual model, there is a need for more exploration in this aspect. In future, an ensemble model can be implemented which combines all three models or any two models for improving the accuracy of emotion recognition and sentiment analysis. Considering the excellent attributes of the transformer based models, this research emphasizes the application of Bert, Roberta, and DistilBERT models for ABSA.

The aspect based sentiment analysis is a mixed approach in sentiment analysis which encapsulates multiple challenges. Researchers have successfully handles these challenges and laid out effective solutions as represented by their contributions in the classification of these sentiments (Mercha & Benbrahim, 2023) [1], (Chandra & Jana, 2020) [2], (Gadri et al., 2022) [3], and (Liu et al., 2020) [4]. These solutions utilize the visual aids and structured methodologies for managing all the intricate tasks such as emotion recognition and sentiment analysis.

Research

Contributions

Mercha & Benbrahim

Identified challenges and proposed solutions [1]

Chandra & Jana

Addressed challenges and provided insights [2]

Gadri et al

Proposed effective strategies for sentiment analysis [3]

Liu et al

Introduced innovative approaches for sentiment analysis [4]

LSTM models are know for representing long-term dependency issues which are successfully applied in ABSA (Bao et al., 2019) [14], (Xing et al., 2019) [15], and (Xu et al., 2020) [16]. These model representations have clearly demonstrated substantial performance improvements, for example, 39% enhancement is ABSA performance for Arabic reviews (Al-Smadi et al., 2019) [17].

Research

LSTM Application

Bao et al

Successful LSTM-based ABSA implementation [14]

Xing et al

Improved ABSA using LSTM models [15]

Xu et al

Application of LSTM for effective ABSA [16]

The pre trained models like BERTm Roberta, and DistilBERT have gained prominence in ABSA research (Zhang et al., 2022) [20], (Shim et al., 2021) [21], and (Troya et al., 2021) [22]. These two directional models offer the most unique advantages in the process of fine tuning with an aim to achieve exceptional results in different sentiment related tasks.

Transformer Model

Key Features & Advantages

BERT

Bidirectional contextual representations [23]

RoBERTa

Optimized pre-training for enhanced performance [24]

DistilBERT

Compact version of BERT with efficient training [25]

ABSA have multiple challenges that researchers have tried to addressed through different innovative solutions. DL models, LTSM approaches, and transformer models are held at great positions to advance sentiment analysis, especially within the complex contexts like ABSA.

The concept of ABSA is not a straightforward approach and incorporates a lot of challenges. Several researchers have analyzed these challenges and have come up with different solutions that have proven their efficacies in identifying and classifying different sentiments. Here, the context of each word in the sequence might be different and the context is realized based on the other words in the sequence. By studying the long-term dependencies, attempts have been made to learn and recognize the features of the word or sentence. Based on this, it can be said that it is crucial for the model to learn the aspect of words by analyzing the aspect of complete sentences without depending on the length and bidirectionality of the aspects of adjacent words in parallel. A. Nagaraja Rao, Y. Harold Robinson & M. N. Thippeswamy) [38] (Hengyun Li, Bruce X.B. Yu, Gang Li, Huicai Gao) [43]. Results show that the hybrid Bi-LSTM—CNN model outperforms other models compared to other sentiment analysis models. However, the performance of the model can be improved by capturing more sentiment information.

A bi-directional Gated Recurrent Unit (biGRU) model is employed for extracting the contextual data. A context-level inter-modal attention technique is implemented for accurately predicting sentiments using the CMU-MOSEI dataset. Although the model achieves better performance in a multi-task learning environment, there is a need to explore other aspects, such as emotion classification and intercity prediction, in the proposed multitask framework. In addition, it is also important to analyze the sentiment by considering the entity or aspect of the sentence.

Fundamentally, the learning process of pre-trained language models is different from conventional pre-trained embedding models in terms of word embedding in a given aspect. The pre-trained language models represent the features from unsupervised text and are fine-tuned to overcome the problem of labelled data deficiency for training.

The BERT model consists of several attention layers and is trained to perform downstream tasks such as emotion recognition and sentiment analysis. The BERT model can be trained to obtain deep bidirectional representations from unlabeled text by concatenating all the layers in the model (Sun et al., 2019) [28].

Although these models have shown excellent results as individual models, there is a need for more exploration in this aspect. In the future, an ensemble model can be implemented that combines all three models or any two models to improve the accuracy of emotion recognition and sentiment analysis. Considering the excellent attributes of the transformer-based models, this research emphasizes the application of the Bert, Roberta, and DistilBERT models for ABSA.

3 Proposed Methodology: Aspect-Based Sentiment Analysis (ABSA) using HybBERT model

In order to understand the user's emotion, this study analyses the sentiment from text data while taking into account several factors. For sentiment analysis, this study uses a hybrid BERT model known as the hybBERT model. The proposed hybrid model combines 3 bi-directional transformer based models such as Bert, Roberta, and DistilBERT for simultaneously performing aspect recognition and sentiment analysis. These models are pre-trained over large-scale unlabeled textural data to represent the language and are fine-tuned to perform ABSA. The ABSA is analyzed through different combinations of models wherein two models are combined and the performance is compared with a single model. The work flow of the proposed approach is illustrated in Fig. 1.

Fig. 1
figure 1

Workflow of the proposed ABSA

4 Materials and Methods

From an academic and professional perspective, sentiment analysis is becoming more and more recognised as a crucial task. However, the bulk of current systems aim to identify the overall polarity of a sentence, paragraph, or text span, regardless of the entities (such as laptops and restaurants) and their components (such as the battery, screen, food, and service). Aspect-based sentiment analysis (ABSA), in contrast, is the focus of this task. Its objective is to detect the aspects of the target entities and the sentiment expressed towards each aspect. Datasets of customer reviews with human-authored annotations indicating the specified properties of the target entities and the polarity of each property's sentiment will be made available.

The task specifically consists of the following subtasks:

  • Subtask 1: Extracting aspect terms

    Determine the aspect terms that are present in a set of sentences with pre-identified entities (such as restaurants) and provide a list of all the different aspect terms. A specific aspect of the target entity is identified by an aspect word.

    For instance, "The food was nothing special, but I loved the staff," or "I liked the service and the staff, but not the food. The phrase "The hard disc is very noisy" is the only instance of an aspect term that consists of more than one word (for example, "hard disc").

  • Subtask 2: Polarity of the aspect word

    Determine the polarity of each aspect term in a statement, such as whether it is positive, negative, neutral, or in conflict (both positive and negative).

Here is the dataset link which will be used in the code:

https://huggingface.co/datasets/Yaxin/SemEval2014Task4Raw/viewer/All/train

4.1 Data Acquisition

The first part of the process involves gathering input data from the database, which is known as data collection or acquisition. In this research, the data is collected from the Hugging Face dataset (Geetha & Renuka, 2021) [42]. The dataset features define the internal structure of the dataset and contain high-level information regarding different fields such as the number of columns, types, data labels, and conversion methods. The dataset consists of data labels such as text, aspect terms, aspect categories, domain and sentence Id which helps to predefine the set of classes and are stored in the form of integers.

4.2 Data Preprocessing

The data extracted from the dataset is in basic raw format and needs to be processed before feeding it to the model for training. By removing uncertainties like missing data, null values, noisy data, and other irregularities, the raw data is transformed into a clean dataset at this step. Preprocessing is the basic step in the ABSA process and involves different steps such as:

  • Sorting the dataset according to the column

  • Shuffling the dataset

  • Filtering the rows according to the index

  • Concatenating the datasets with the same column types

The obtained data consisted of multiple columns containing responses, opinions, and feedback. The columns with sentiment labels and individual response are considered in this research for ABSA. Hence, these two columns are subjected to preprocessing. It was observed that few columns consisted of sentiment labels without any textual responses. In addition, the data also consisted of special characters, irrelevant tags, double spacing and redundant expressions, which are also filtered since they can have a negative impact on the performance of sentiment analysis. The dataset, with a total of 4631 rows, is taken into consideration for the analysis, with a total of 2 columns after filtering. The data are divided into training, testing, and validation samples, with 3509 samples being used for training, 1028 samples being used for testing, and 94 samples being used for validation.

4.3 Aspect-based feature extraction

In general, feature extraction is performed extracting relevant features for the sentiment analysis and classification process (Meng et al., 2019) [33]. When the majority of the characteristics are not enough to contribute to the total variance, the dimensionality is reduced by extracting just the relevant features. The computing time will be cut down, and the overall performance will be improved, by removing unnecessary and duplicate features. Aspect-based feature extraction techniques extract the essential features that are crucial for defining the text's primary traits, or aspects. This helps in finding important textual elements that reflect an opinion or a mood. In order to choose pertinent and appropriate features from the text, new feature subsets are acquired at this stage. For aspect-based feature extraction, two bags of words are created, where the first group contains aspects and the second group contains the tendency of sentiment polarity.

In aspect based feature extraction, it is highly challenging to extract explicit aspects, aspect categories, and Opinion Term Extraction (OTE) based on which the sentiment is expressed (Zhang et al., 2022) [34]. An explicit aspect here means that the sentiment is present in the text, for instance: “the food in this restaurant is great” is an example of an explicit aspect where food is an explicit aspect and restaurant is the explicit entity. While performing aspect extraction, the relation between different aspects is also analyzed for identifying the coherent, consistent, and aspects with high similarity from the dataset in order to improve the overall representation of the extracted aspects.

4.4 Aspect Classification using HybBERT model

This study implements a HybBERT model for aspect classification and ABSA. During the training phase, the models will use examples from the data set to determine how to transform the input text into a class of pertinent attributes (Lu Xu, Lidong Bing, Wei Lu, Fei Huang, November 2020) [25]. The feature vectors and appropriate class labels will be given for the proposed HybBERT model. The classification models used for ABSA are explained as follows:

4.4.1 BERT Model

BERT is a bi-directional transformer model which is used to pre-train a large volume of unlabeled textural data to learn how to represent a language and to fine tune the model for performing a specific task. BERT model outperforms when applied for natural language processing (NLP) and sentiment analysis tasks (Howard & Ruder, 2018) [35] (Radford et al., 2018) [36]. The improved performance of the BERT model is due to the bidirectional transformer and its ability to pre-train the model for predicting the next sentence. BERT employs a fine tuning mechanism which almost eliminates the need for a specific architecture for performing each task. Hence, the BERT model is considered as an intelligence model which can reduce the necessity of prior knowledge before designing the model and instead it enables the model to learn from the available data. The BERT model has two architectural settings which are the BERTBASE and BERTLARGE. The BERTBASE has 12 layers with 768 hidden dimensions and 12 attention heads (in transformer) with 110 M number of parameters. The BERTLARGE, on the other hand, has 24 layers, 1024 hidden dimensions, and 16 attention heads (in a transformer) with 340 M parameters.

In this research the BERT model is fine-tuned to perform two tasks namely aspect extraction (AE) and ABSA. For aspect extraction, the BERT model is provided with continuous samples of data which are labeled as A and B as aspects. The input text ‘n’ words are constructed as x = (× 1, …., xm). After h = BERT (x), a dense layer along with a Softmax layer is applied for each position of the text denoted as l = softmax (W * h + b) where, W denotes the text in the word, l is the length of the sequence, h and b are the dimensions. Softmax is applied along with the dimension of labels for each text and the labels are predicted based on the position of the text.

For aspect classification, the BERT model is fine tuned to classify the polarity of the sentiment (Positive, negative, or neutral) based on the aspect extracted from the text. For ABSA, the model is provided with two inputs namely an aspect and a text defining the aspect. Let x = (p1,.,pm) be an aspect with ‘n’ number of tokens, and x = (t1,.,tm) defining the text containing the aspect.

Similar to aspect extraction, after h = BERT (x), the learning representation of the BERT model is leveraged to obtain the polarity of the text, based on which the polarity of the text is calculated.

The fine tuning of the BERT model is simple and straightforward since the learning capability of the transformer supports the model in various down streaming tasks whether in terms of classifying the single text or multiple texts by interchanging the respective inputs and outputs (Azhar & Khodra, 2020) [37]. For classifying the sentiments in multiple text pairs, the BERT model derives a common pattern which can automatically encode the text pairs and performs bidirectional cross attention between two sentences.

4.4.2 ROBERTA Model

An improved version of the BERT model, the ROBERTA model was first presented by Facebook. The ROBERTA model is obtained as a retrained BERT model with more computing capabilities and better training process. In order to achieve enhanced training, ROBERTA eliminates the Next Sentence Prediction (NSP) task from the existing BERT model and employs a dynamic masking approach such that the masked token can be changed while training the model for several epochs. This factor also enables the ROBERTA model to train using large training samples. The text obtained from the dataset is given as input to the model, and the outputs are obtained from the last layer of the model. While using the ROBERTA model for ABSA, the model is fine-tuned using a cross-entropy loss function, which enables the model to effectively leverage its potential in terms of ABSA. Another important advantage of the ROBERTA model is utilization of dynamic masking, wherein the model is provided with a dynamic set of data samples from the text instead of providing only a fixed set of data samples as in the BERT model. This improves the learning ability of the ROBERTA model by enabling it to learn from a diversified set of data. In addition, the dynamic masking also makes the model more resilient to the changes in the input data, which is more important while handling diverse, irregular and inconsistent text. In addition, the ROBERTA model also employs a No Mask Left Behind (NMLB) technique which makes sure that all the data samples are masked at least once during training unlike BERT models which uses a Masked Language Modeling (MLM) technique that masks only 15% of the data samples. This improves the representation ability of the text and helps the model to exhibit better ABSA performance. For ABSA, the ROBERTA model consists of four modules namely, (i) a word embedding layer, (ii) a semantic representation layer, (iii) cross attention layer, and (iv) a classification output layer.

In the word embedding layer, the ROBERTA model is pre-trained using multiple embedded layers of sentences and aspects to obtain an improved representation of aspects. The transformer used in the ROBERTA model helps in capturing the bidirectional relationship between the sentences and helps in mapping each word wi in the sentence and its aspect to a low dimensional vector vi ∈ Rdw, where dw is the dimension on the word vector. The pre-training of the model results in the sentences and aspects that caters the sentence vector {v1s, v2s, v3s,. vns} ∈ Rn *dw and the corresponding aspect vector {v1s, v2s, v3s,…..vns} ∈ Rm *dw. Both the sentence and the aspect vectors obtained from the word embedding layer are embedded with the word vector and are given as input to the semantic representation layer. In the cross attention layer, the word vectors obtained from the semantic representation layer are used to determine the effect of weight of the aspect and sentence. Initially, the cross attention layer takes the aspect and sentence denoted as ha ∈ Rn *dw and hs ∈ Rn *dw respectively and later the matching matrix I = hs * ha is calculated. Further, a Softmax function is applied to the matrix I to obtain a degree α and β of the sentence to the aspect. Lastly, the average of the attention is calculated using the attention degree β of the sentence to the aspect is calculated. The most important aspect is represented using term β* which along with degree α is used to calculate the final output denoted as γ ∈ Rm, as shown in below equations:

$$\mathrm{\alpha ij}=\mathrm{exp }(\mathrm{Iij}) /\mathrm{ \Sigma i }(\mathrm{Iij})$$
(1)
$$\mathrm{\beta ij}=\mathrm{exp }(\mathrm{Iij}) /\mathrm{ \Sigma i }(\mathrm{Iij})$$
(2)

Correspondingly, the important term of aspect β* is represented as follows:

$${\upbeta }^{*}=(1/\mathrm{n})\Sigma (\upbeta )$$
(3)

Finally, the impact is denoted as

$$\upgamma ={\mathrm{\alpha }}^{*} {\upbeta }^{*}$$
(4)

The classification output layer generates the output of the model by convoluting the results of the semantic representation layer and uses it to predict the final polarity of the sentiment corresponding to the aspect, as shown in below equations:

$$\mathrm{r}=\mathrm{h }{\mathrm{w}}^{*}\upgamma$$
(5)
$$\mathrm p=\mathrm{softmax}\;(\mathrm w^\ast\mathrm r+\mathrm b)$$
(6)

The term ‘p’ defines the probability distribution of sentiment analysis, w and b define the weight and bias of the matrices respectively. The model is trained using a cross-entropy loss function and L1 regularization, as shown in Eq. 7.

$$\mathrm{L}1=-\mathrm{\Sigma i \Sigma c}\in \mathrm{C }[((\mathrm{yi}=\mathrm{c})*\mathrm{log }(\mathrm{p }(\mathrm{yi}=\mathrm{c}))+\uplambda ||\uptheta ||2)]$$
(7)

wherein p (yi = c) is the predicted sentiment polarity and (yi = c) is the actual sentiment polarity, and λ is the regularization parameter.

4.4.3 DISTILBERT Model

The DISTILBERT model is a distilled version of the BERT model that exhibits almost the same performance as the BERT model but uses only half the number of parameters. In particular, the DISTILBERT model does not incorporate any word-based embeddings and makes use of only half of the layers from the BERT model. The main difference between the BERT and DISTILBERT models are illustrated in Table 1.

Table 1 Difference between BERT and DISTILBERT models

The distillation mechanism used by the DISTILBERT model approximates the large neural network model by using a smaller one. The main idea behind this model is to train a larger neural network once and the output distributions can be approximated using a smaller model. DISTILBERT models are trained using a triple loss function which is a linear combination of the distillation loss (Lce) with other loss functions such as training loss, MLM loss (LMLM), and a cosine embedding loss (Lcos) which establishes a coordination between the aspect and sentence using the state vectors. The distillation loss is determined as follows:

$$\mathrm{Lce}=\mathrm{\Sigma i ti}*\mathrm{log }(\mathrm{si})$$
(8)

where, ti (si) is the probability of the estimation.

Using the triple loss function, a 40% smaller transformer (Vaswani et al., 2017) [38] can be pre-trained using distillation with 60% faster interference time. Studies have shown that the triple loss function parameters have the capacity to achieve excellent performance (Sanh et al., 2019) [39]. This research uses a pre-trained DISTILBERT model from the hugging face transformer library for ABSA, which is distilled from the fundamental BERT model. The excellent performance of the DISTILBERT model is due to its ability to extract bidirectional contextual information from the text data during the training process. Besides, the compactness of the DISTILBERT model increases the scalability of the model and helps them achieve better performance in real-time sentiment analysis tasks ( Kevin Scaria, Himanshu Gupta, Siddharth Goyal, Saurabh Arjun Sawant Swaroop) (Lu Xu, Lidong Bing, Wei Lu, Fei Huang) [44, 46] while retaining 97% of the BERT’s performance. Figure 2 shows a schematic illustration of the three models utilized in the ABSA method.

Fig. 2
figure 2

Process of ABSA using different transformer model combinations

Figure 2 is a schematic illustration of ABSA Models with BERT, RoBERTa, and DistilBERT that highlights their integral roles. These models have played an instrumental role in this study. The Aspect Extraction Model, which has been represented in blue is reflected to harness the power of BERT. BERT has excelled in understanding the context where aspects or features are mentioned within the textual data with the extraction of these aspects and providing an understanding of what we have discussed.

The Sentiment classification model has leveraged RoBERTa which makes it a robust variant of BERT. excelling significantly at sentiment classification which is detrimental in classifying it as positive, negative, or neutral which assures accuracy in sentiment analysis.

The contextual analysis model apparently utilizes the DistilBERT model which is a distilled version of BERT. This efficiently processes and analyzes the relationship between aspects and sentiments within a relatable broader context of the text ensuring the correct attributions of sentiments while considering contextual nuances and linguistic intricacies.

Together these models synergize while offering a comprehensive approach to ABSA, BERT, RoBERTa, and DistilBERT's unique capabilities enabling nuanced and context-aware sentiment analysis while providing valuable insights into the sentiment associated with specific aspects as discussed in textual data.

The feature extractor is utilized in the ABSA technique to extract feature vectors from a brand-new text that has never been seen before. The model then generates the results using these feature vectors. Each text in the dataset has one of three sentiments: positive, negative, or neutral (Kevin Scaria, Himanshu Gupta Siddharth Goyal, Saurabh, Arjun Sawant Swaroop Mishra Chitta Baral [44]. The sentiment score, which is based on positive and negative emotions and is represented by the equation below, may be used to assess each text's sentiment. (Ruz and others, 2020) [40].

$$\mathrm{Sentiment}\;\mathrm{score}(\mathrm{SC})=(\mathrm{positive}-\mathrm{negative})/(\mathrm{positive}+\mathrm{negative}+2)$$
(9)

As a result, the positive and negative describe the total number of positive and negative words in the text, and the SC may be determined using a discrete 2-valued variable C that also defines the emotion class and falls between -1 and 1, i.e., C ∈ {-1, 1}.

The term ‘C’ records the sentiment values and their respective classes. In certain cases, it is difficult to identify the aspects from the sentiments based on the polarity values and in such cases, certain constraints (as shown in the Eq. 10) need to be followed to identify whether the sentiment is positive or negative.

$$\begin{array}{l}\mathbf C=1(\mathbf p\mathbf o\mathbf s\mathbf i\mathbf t\mathbf i\mathbf v\mathbf e\boldsymbol\;\mathbf t\mathbf w\mathbf e\mathbf e\mathbf t)\;\mathbf i\mathbf f\boldsymbol\;\mathbf S\mathbf e\mathbf n\mathbf t\mathbf i\mathbf m\mathbf e\mathbf n\mathbf t\boldsymbol\;\mathbf s\mathbf c\mathbf o\mathbf r\mathbf e\geq0.1\\=-1(\mathbf n\mathbf e\mathbf g\mathbf a\mathbf t\mathbf i\mathbf v\mathbf e\boldsymbol\;\mathbf t\mathbf w\mathbf e\mathbf e\mathbf t)\;\mathbf i\mathbf f\boldsymbol\;\mathbf S\mathbf e\mathbf n\mathbf t\mathbf i\mathbf m\mathbf e\mathbf n\mathbf t\boldsymbol\;\mathbf s\mathbf c\mathbf o\mathbf r\mathbf e<0.1.\end{array}$$
(10)

The polarity of texts that have aspects is assessed when they are used as input to the models to determine if the text is positive, negative, or neutral. The results enable the identification and classification of the text's sentiment. The pseudocode for this process is given as follows:

Pseudocode

figure a

Most of the studies on ABSA have focused on the implementation of the transformer models as an individual model. This research emphasizes the analysis of three different combinations of the transformer model such as the BERT—ROBERTA model, BERT—DISTILBERT model, and ROBERTA—DISTILBERT model. The performance of these model combinations are discussed in the next section.

5 Results Analysis and Discussion

In Aspect-Based Sentiment Analysis (ABSA), the primary focus is on extracting and analysing sentiment expressed towards specific aspects or features within a piece of text, such as a customer review. To perform ABSA effectively, we need to consider several key aspects of the analysis process:

Aspect Identification and Extraction

This is the foundation of ABSA. Identifying and extracting the aspects or features being discussed in the text is crucial. Aspects can vary depending on the domain and context. Techniques for aspect extraction can include rule-based methods, dependency parsing, or utilising pre-trained models.

Sentiment Polarity Detection

After extracting aspects, we need to determine the sentiment polarity associated with each aspect mentioned in the text. Sentiment polarity typically falls into categories like positive, negative, neutral, or more fine-grained scales. This step often involves sentiment lexicons, machine learning models, or deep learning approaches for sentiment classification.

Based on a variety of performance criteria, including F1 Score, Recall, Accuracy, and Support, the suggested BERT—ROBERTA model, BERT—DISTILBERT model, and ROBERTA—DISTILBERT model's performance is assessed. The sections below show the simulation results for sentimental analysis, and the confusion matrix was used to gauge how effective the suggested strategy is. The performance of the classifier is assessed using a confusion matrix employing the four elements TP, FP, TN, and FN. The phrases True positive, False positive, False negative, and True negative are TP, FP, FN, and TN, respectively. An illustration of the confusion matrix is given in Fig. 3

Fig. 3
figure 3

Confusion matrix

Here, TP defines the number of correctly identified positive sentiments, TN defines the number of correctly identified negative sentiments, FP is the inaccurately identified positive sentiments, and FN is the inaccurately identified negative sentiments. The expression of the different performance metrics are given in the below expressions as follows:

  • Accuracy:

    $$\begin{array}{l}Accuracy=\frac{( No of correctly identified sentiments)}{( Total number of sentiments in the test dataset)}\\ {\varvec{A}}{\varvec{c}}{\varvec{c}}{\varvec{u}}{\varvec{r}}{\varvec{a}}{\varvec{c}}{\varvec{y}}=\frac{TP +TN}{TP +TN+FP+FN}\end{array}$$
    (11)
  • Precision:

Precision is defined as the ratio of the correctly identified positive sentiments which are relevant and is given as:

$${\varvec{P}}{\varvec{r}}{\varvec{e}}{\varvec{c}}{\varvec{i}}{\varvec{s}}{\varvec{i}}{\varvec{o}}{\varvec{n}}=\frac{TP}{TP+ FP}$$
(12)
  • Recall:

Recall for a function is determined as the ratio of the correctly identified positive sentiments to the sum of true positive and false positive. The expression for recall is given as shown in equation 13.

$$Recall=\frac{TP}{TP + FN}$$
(13)

5.1 F1 Score

F1 score is also termed as an F measure score which is used to determine the harmonic mean of the precision and recall. F1 score is closely related to the accuracy of the classifier whose value lies between 0 and 1, where 0 and 1 represents the worst and best values respectively. F1 score can also be used to measure the performance of the model wherein the datasets are highly imbalanced since it uses both recall and precision to obtain an optimal value. Even if there are fewer positive samples compared to negative samples, the expression used to measure the F1 score will weigh the metric value. Correspondingly, F1 score is defined as:

$$F1 score=\frac{2 *Precision * Recall}{Precision + Recall}$$
(14)

5.2 Performance of BERT, ROBERTA, and DISTILBERT models

Initially, the performance of the BERT, ROBERTA, and DISTILBERT models is evaluated as individually and the classification report of the respective models are discussed as follows:

  • Classification Performance of BERT model

The performance of the BERT model is shown in Table 2 and the corresponding confusion matrix is shown in Figure 4.

Table 2 Classification report for sentiment analysis using the BERT model
Fig. 4
figure 4

Confusion matrix for the BERT model

This figure is more like a report card representation for the BERt model that we have made use in the ABSA. It is a representation of how efficiently the model has performed in the classification of sentiments like the positives or negatives for different aspects of the text. This chart is a better way for us to analyze how well a BERT model is at understanding sentiment. It also reveals where the model is performing well and where an improvement is needed in ABSA something very similar to checking how many answers on a test have been right or wrong.

Precision also referred to as the Positive Predictive Value, indicates the measure of the proportion of appropriately defined positive instances out of all the total instances declared as positive for each class. A high precision is an indication of all the instances predicted as positive and so on for each class. A high precision is an indication of instances are classified as positive and likely to be positive.

Recall is also referred to as Sensitivity or True Positive Rate that is a measurement of the proportion that currently predicts positive instances out of all the actual instances. A high recall indicates that the model effectively captures most of the positive instances.

Support is a precise indication of the number of instances prescribed in each class that calculates the metrics. It is an understanding towards relative prevalence in the dataset.

Accuracy is the measurement of the proportion of aptly predicted instances from all the instances. It is a common metric for evaluation the entire performance of a model.

The represented information makes the BERT model to be aptly performing in the task of sentiment analysis, with high accuracy, precision, recall, and F1 score for each class. The results of the classification shows that the BERT model exhibits a phenomenal accuracy of 97.24%, F1 score of 97.19%, precision of 97.35%. And recall of 94.24%.

  • Classification Performance of ROBERTA model

The performance of the ROBERTA model is shown in Table 3 and the corresponding confusion matrix is shown in Figure 5.

Table 3 Classification report for sentiment analysis using the ROBERTA model
Fig. 5
figure 5

Confusion matrix for the ROBERTA mode

  • Classification Performance of DISTILBERT model

The classification report of the DISTILBERT model is shown in Table 4 and the corresponding confusion matrix is shown in Figure 6.

Table 4 Classification report for sentiment analysis using the DISTILBERT model
Fig. 6
figure 6

Confusion matrix for the DISTILBERT model

The DISTILBERT model achieves an outstanding classification performance by achieving 100% in terms of all evaluation metrics. In comparison to the BERT model and ROBERTA model the DISTILBERT model outperforms with an average accuracy of 100% and average precision of precision, recall, and F1 score of 100% respectively. The results show that the DISTILBERT model is more appropriate for performing real time ABSA and in terms of performance, BERT can be opted as second best model and ROBERTA is the least preferred.

5.3 Performance of different combination of transformer models

This section discusses the performance of different combinations of transformer models such as BERT - ROBERTA model, BERT - DISTILBERT model, and ROBERTA – DISTILBERT models.

  • Performance of final model for ABSA

The ABSA performance of the final model is shown in Table 2 and the corresponding confusion matrix is shown in Figure 7

Fig. 7
figure 7

Confusion matrix for the final model

The results of the classification shows that the final model achieves better classification performance with an accuracy of 97%, F1 score of 97%, precision of 97% and a recall of 97%.

  • Combining DISTILBERT model and ROBERTA model

The performance of the DISTILBERT model and ROBERTA model is shown in Tables 5 and 6 and the corresponding confusion matrix for ABSA is shown in Figure 8.

Table 5 Classification report for sentiment analysis using the final model
Table 6 Classification report for sentiment analysis using the DISTILBERT—ROBERTA model
Fig. 8
figure 8

Confusion matrix for the DISTILBERT—ROBERTA model

As inferred from the results, the excellent performance of the DISTILBERT - ROBERTA model validates its adaptability in real time ABSA tasks. This combination achieves 100% accuracy along with other performance evaluation metrics.

  • Combining DISTILBERT model and BERT model

The performance of the DISTILBERT model and BERT model is shown in Table 7 and the corresponding confusion matrix for ABSA is shown in Figure 9.

Table 7 Classification report for sentiment analysis using the DISTILBERT—BERT model
Fig. 9
figure 9

Confusion matrix for the DISTILBERT—BERT model

Similar to the DISTILBERT - ROBERTA model, the combination of DISTILBERT - BERT model also achieves 100% accuracy along with other performance evaluation metrics. This shows that the DISTILBERT model combined with any other model can help in achieving phenomenal results.

  • Combining ROBERTA model and BERT model

The performance of the ROBERTA - BERT model is shown in Table 8 and the corresponding confusion matrix for ABSA is shown in Figure 10.

Table 8 Classification report for sentiment analysis using the ROBERTA—BERT model
Fig. 10
figure 10

Confusion matrix for the ROBERTA—BERT model

The classification report of the ROBERTA - BERT model is shown in Table 8. The combined ROBERTA - BERT model achieves a moderate accuracy of 67% along with other performance evaluation metrics. Results show that the performance of the ROBERTA model underperforms when used as an individual classifier or combined with other models.

  • Performance of the combined dual model predictions

The performance of the combined dual model predictions is shown in Table 9 and the corresponding confusion matrix for ABSA is shown in Figure 11

Table 9 Classification report for sentiment analysis using the combined dual model prediction
Fig. 11
figure 11

Confusion matrix for the combined dual model prediction

The classification report of the combined dual model is shown in Table 9. Similar to the DISTILBERT model and DISTILBERT- BERT model, the combined dual model achieves an excellent accuracy of 100% along with other performance evaluation metrics.

The confusion matrix obtained after experimental analysis is presented from Figures 4, 5, 6, 7, 8, 9, 10 and 11 for BERT, ROBERTA, and DISTILBERT models as individual models and the combined DISTILBERT- BERT, BERT - ROBERTA, and DISTILBERT- ROBERTA models along with final model and combined dual model. The classification report is generated for different candidate models for validation data.

5.4 Comparative Analysis

This section discusses the comparative analysis of all the transformer models with different combinations. Here, the performance is compared with respect to different case studies as discussed below:

  • Case i: BERT- ROBERTA model: Here, the performance of the combined Bert-Roberta model is evaluated, and the results are compared with those of the Distilbert model.

  • Case ii: BERT- DISTILBERT model: Here, the performance of the combined Bert-Distilbert model is evaluated, and the results are compared with those of the ROBERTA model.

  • Case iii: DISTILBERT- ROBERTA model: Here, the performance of the combined DISTILBERT-ROBERTA model is evaluated, and the results are compared with the BERT model.

  • Case 1: Combination of all models

The classification report of all combinations of the models are compared and the same is tabulated in Table 10 and the graphical representation of the ABSA performance is illustrated in Figure 12.

Table 10 Classification report of all model combinations
Fig. 12
figure 12

Classification performance of all model combinations

  • Case 2: DISTILBERT, ROBERTA, BERT, with DISTILBERT + ROBERTA + BERT

The classification performance of the individual models with combined models is shown in Table 11 and Figure 13.

Table 11 Classification report of individual models with DISTILBERT + ROBERTA + BERT
Fig. 13
figure 13

Classification performance of individual models with DISTILBERT + ROBERTA + BERT

  • Case 3: DISTILBERT, ROBERTA with DISTILBERT + ROBERTA

The classification performance of the DISTILBERT and ROBERTA models with combined DISTILBERT + ROBERTA model is shown in Table 12 and Figure 14.

Table 12 Classification report of DISTILBERT, ROBERTA models with DISTILBERT + ROBERTA
Fig. 14
figure 14

Classification performance of DISTILBERT, ROBERTA models with DISTILBERT + ROBERTA

  • Case 4: DISTILBERT, BERT with DISTILBERT + BERT

The classification performance of the DISTILBERT and BERT models with combined DISTILBERT + BERT model is shown in Table 13 and Figure 15.

Table 13 Classification report of DISTILBERT, BERT models with DISTILBERT + BERT
Fig. 15
figure 15

Classification performance of DISTILBERT, BERT models with DISTILBERT + BERT

  • Case 5: DISTILBERT, BERT with DISTILBERT + BERT

The classification performance of ROBERTA and BERT models with the ROBERTA + BERT model is shown in Table 14 and Figure 16.

Table 14 Classification report of ROBERTA, BERT models with ROBERTA + BERT
Fig. 16
figure 16

Classification performance of ROBERTA, BERT models with ROBERTA + BERT

  • Case 6: Combined dual models

The classification performance of the combined dual models is shown in Table 15 and Figure 17.

Table 15 Combined dual models classification report
Fig. 17
figure 17

Classification performance of combined dual models

As inferred from the classification report, the order of the ABSA performance of different models can be represented in the decreasing order as ROBERTA, BERT, and DISTILBERT models with a classification accuracy of 67 %, 97 %, and 100 % respectively. In all cases, the DISTILBERT model exhibits superior performance as both individual and hybrid classifier. Since all models were successful in identifying and classifying the sentiments from the data, it can be said that these models are effective in ABSA processes.

5.5 Discussion of Findings & Contribution

We have implemented a novel approach in this study related to the HyBERT model, which focuses on the strengths of transformer models like BERT, ROBERTA, and DISTILBERT and enhances aspect-based sentiment analysis (ABSA) and manages sentiment classification. Using textual data sourced from the Hugging Face Dataset, our research is centred on analysing the sentiments across different aspects. Our investigation encompasses a wide range of case analyses that assess the performance of ABSA on an individual and combined basis.

5.6 Analysis of Transformer Model Performance

Our evaluations were conducted by utilising the essential performance metrics that include the F1 score, recall, accuracy, and support. Undoubtedly, with individual applications or when employed with the BERT and RoBERTa models, the DISTILBERT model exhibits an impressive accuracy of 100%. This detailed categorization uncovers the consistency of this achievement by taking into consideration all combinations of all models. The BERT model, when closely observed, demonstrated a strong accuracy rate of 97%. Contrastingly, the performance of the ROBERTA model was lower and yielded an average accuracy of just 67%.

6 Validation Of HyBERT Model

The outcomes of our research validate the effectiveness of our proposed ByBERT model for real-time ABSA applications. Our model outperforms other models in terms of accuracy for both aspect-based sentiment analysis and sentiment categorization. These high levels of remarkable accuracy, which are achieved by the HyBERT model, reveal the potential to enhance the decision-making process based on reliable sentiment insights.

6.1 Significance of Future Implications

Our research has substantial applications in the fields of sentiment analysis and natural language processing. By blending the strengths of the transformer models, we extend a remarkable understanding of the sentiment complications represented in the textual data. Moving ahead, our study encourages the deployment of more difficult assemblies of transformer models that further elevate accuracy and applications. In addition to this, the principles and insights gained have also extended to other tasks that are text-related, which further contributes to deeper comprehensions of language-context interactions.

6.2 Real-time case study

These days, rating and reviewing culinary recipes, as well as submitting, looking for, and downloading them, have become daily routines. On YouTube, millions of users are looking to exchange recipes. Through user feedback, a user spends a lot of time looking for the greatest cooking recipe. Sentiment analysis and opinion mining are essential tools for gathering information to ascertain what people are thinking. This sentiment-based real-time system that mines YouTube meta-data (likes, dislikes, views, and comments) in order to extract significant aspects related to culinary recipes and detect opinion polarity in accordance with these qualities is a real-time case study. To enhance the functionality of the system, a few authors worked on the vocabulary of cooking instructions and suggested some algorithms built on sentiment bags, depending on certain terms connected to food emotions and injections.

Or another example can be Aspect-Based Sentiment Analysis, which provides valuable insights into different aspects of the restaurant's performance, allowing the management to focus on improving specific areas to enhance the overall customer experience. For example, the restaurant might want to work on speeding up service and addressing pricing concerns to attract more customers and improve their overall rating.

6.3 Predominance of the Study Over Existing Studies of the Domain

The predominance of this study is largely influenced and encapsulated in the innovative approach, which involves a comprehensive evaluation of the transformer-based models for ABSA and the classification of sentiments. While previous studies have contributed valuable insights into sentiment analysis and ABSA, our study has a distinctive aspect that further elevates its significance.

HyBERT Model: Our study is an introduction to the HyBERT model, which is a novel hybrid approach that extends the power of the BERT, ROBERTA, and DISTILBERT transformer models. This integration harnesses the unique power of each model, which helps in achieving superior accuracy and mixed sentiment analysis.

Comprehensive Performance Analysis: Evaluation of the Performance of Transformer Models that Cover Multiple Aspects of ABSA and Sentiment Classification This study is a systematic comparison of individual models that combine models and other model combinations that leverage a comprehensive understanding of capabilities.

Diversified Evaluation Metrics: Employing evaluation metrics inclusive of F1 score, recall, accuracy, and support that comprehensively assess the performance of models The holistic approach is an assurance that our findings are robust and reliable and capture various performance dimensions.

Future-Ready Recommendations: Our study aims for results and makes actionable recommendations for future research and challenges in implementation. Highlighting complex avenues leverage strategies while we feel implied towards the advanced and impactful sentiment analysis techniques.

7 Conclusion

Our study not only introduces the concept of a novel ByBERT model that significantly improves ABSA performance but also uncovers the importance of transformer-based techniques through sentiment analysis. These particular strengths and synergies among the BERT, RoBERTa, and DistilBERT models are examples of enhanced accuracy and insight generation. Our research actually paves the way for more sophisticated and effective methodologies related to sentiment analysis, ultimately facilitating the process of more informed decision-making across various domains. Furthermore, our research lays the path for accepting transformer-based approaches in the intricate ABA scenarios, which therefore usher in a more detailed understanding of sentiment with multiple textual domains.

A novel HybBERT model is presented in this paper, which combines the efficacy of the transformer models such as BERT, ROBERTA, and DISTILBERT for aspect-based sentiment analysis and sentiment classification. The models are trained with the textual data containing aspects from the Hugging Face dataset to analyze the sentiments of the textual data based on different aspects. This research conducted different case analysis for analyzing the performance of the transformer models, the performance of combined dual models, and the performance of different combinations of the models for ABSA. The proposed approach's performance was evaluated by simulating the gathered data with respect to several performance measures such as F1 score, recall, accuracy, and support. When used alone or in conjunction with the BERT and ROBERTA models, the DISTIBERT model's accuracy can reach 100%, as can be seen from the categorization report for all potential combinations. The BERT model, on the other hand, obtained the second-best result with an accuracy of 97%. The ROBERTA model underperforms compared to other models, with an average accuracy of 67%. The results verify the proposed HybBERT model's usefulness for performing real-time ABSA by attaining superior accuracy for both aspect-based sentiment analysis and sentiment categorization.