Keywords

1 Introduction

Nowadays, sentiment analysis is a domain that develops rapidly. There are more and more models, methods and algorithms that help the user to form an opinion about particular topic, person, issue, service, etc. [1, 5]. Developed artificial intelligence methods provide us with better and better results in this area.

To obtain better accuracy, more complicated structures of the model and more sophisticated methods are used. The problem arises when user asks how a particular result was achieved or how a particular sample has influenced the final model [34]. It is a problem of reliability of the system [35].

In this paper we present a survey of explainable artificial intelligence (XAI) methods that are used to increase user’s trust to the system of sentiment analysis or opinion mining. We have provided the following sections: In Sect. 2 we present the basic concepts correlated to sentiment analysis and XAI. In Sect. 3 we describe related works for XAI methods in sentiment analysis. Some trends and challenges are provided in Sect. 4. Final remarks and summary are in Sect. 5.

2 Background

In this section we present the basic idea of sentiment analysis, explainable artificial intelligence and feature selection. We provide some definitions and description of the problem.

2.1 Sentiment Analysis

Nowadays, sentiment analysis is a very important issue as it can influence many aspects of everyday life. Before a user decides to buy or order a product or service, he or she tries to find the best offer but more and more often he or she looks for opinion from other users about the product or service.

In the last few years, the development of e-commerce systems and social networks has allowed the user to share his or her opinion easily [31]. On the other hand, the user can find a huge amount of e.g. product reviews, so that it is impossible to manage out all information. Many systems offer recommendations or decision support algorithms to improve user experience. Using sentiment analysis techniques allows to additionally enrich the accuracy of recommendations as they reflect users opinions.

The most popular tasks based on sentiment analysis are as follows: opinion mining [31], fake news detection [28, 29] and stance detection [10]. The main contribution of sentiment analysis is to extract opinions from different modalities, e.g. text, image, video, etc. and usually combine them to obtain a final polarity. There arises a problem of opinion veracity and credibility which lead us to the fake news detection issues. It is possible to use sentiment analysis approaches to judge if a news is true or fake. The stance detection problem is correlated to users attitude toward a situation or an event. The user can agree or disagree with statements of other users.

According to Phan et al. [29] “Sentiment is the feeling, attitude, evaluation, or emotion of users toward specific aspects of topics or for the topics”. The set of possible values of sentiment can be defined in many ways, e.g. [15, 17, 29]:

  • s = {positive, negative}.

  • s = {positive, neutral, negative}.

  • s = {positive, neutral, negative, mixed}.

  • s = {strong/very positive, positive, neutral, negative, strong/very negative}.

  • s = {very very negative, very negative, negative, somewhat negative, neutral, somewhat positive, positive, very positive, and very very positive}.

Sentiment analysis is a “process used to determine the sentiment orientation in opinions” [29]. The process can be treated as a classification problem: classify a given opinion o toward specific aspect or topic into one sentiment polarity from set s [6]. Sentiment analysis can be divided into three levels: document level (when we judge the polarity of the final conclusions of some report), sentence level (polarity of each sentence) and aspect level (polarity towards particular aspect).

Phan et al. [29] defines the problem in a wider way: sentiment is an attitude of a particular user u in a timestamp t towards a given topic p. The user u delivers an opinion about the topic p and the task is to judge whether the opinion is positive or negative.

The most popular methods for sentiment analysis are those based on machine learning approaches or those based on lexicon approaches (e.g. corpus or dictionary based approaches) [22, 29, 39]. In the first group one can use supervised methods (e.g. probabilistic classifiers: Naive Bayesian, Bayesian network, maximum entropy, linear classifier: SVM, neural network, decision tree, rule-based methods, etc.), semi-supervised methods (e.g. self-training, graph-based, generative models), unsupervised methods (k-means, fuzzy c-means, agglomerative and divisive algorithms) or deep learning methods (RNN, CNN, LSTM, GNN, GCN, etc.) [37]. We can also find many hybrid approaches that combine machine learning with lexicon-based approaches, especially deep neural networks and lexicon-based methods. Usually, methods from the last group obtain the best results.

To judge the efficiency of the method we can use typical efficient metrics, such as precision, recall, F-measure and accuracy. Usually more complicated methods obtain better results than linear or simple methods. On the other hand, these methods are hard to explain. It is not obvious how single opinion or statement affects the final result. This is the reason for the popularity of developing explainable methods.

2.2 Explainable Artificial Intelligence

Artificial intelligence has appeared in many aspects of our life, e.g. medicine, transport, e-commerce, intelligent houses, etc. The systems can help the doctor to analyze X-ray or magnetic resonance images [13], support car drivers [16], recommend us some personalized products or services [25], allows us to “talk” with ChatGPT [9], etc.

The main contribution of XAI is to increase user’s trust in AI systems. User confidence is crucial in many situation, especially when the results of these systems affect our health or even life.

The main idea of the XAI is to explain why the system obtained the particular result. It can be illustrated with the Albert Einstein’s quote: “If you can’t explain it simply, you don’t understand it well enough”. It is an important aspect of many deep algorithms were it is not obvious what information does the network contain or why does this particular input lead to that particular output [14].

The most frequent division of XAI approaches is into two groups: visualization methods and post-hoc analysis. In the first group, there exists a few algorithms that do not need any explanation as they are transparent enough. They are: linear or logistic regression, decision trees, kNN, rule based learners, general additive model, or Bayesian models [3]. The category of post-hoc analysis contains more sophisticated methods that do not allow to easily explain why a particular case was classified into particular class. They are e.g. tree ensembles, SVM, deep neural networks: multi-layer, convolutional or recurrent neural networks. Usually, the following techniques are used for explaining how they work: model simplification, feature relevance, local explanations or visualization in the post-hoc step.

Athira et al. [4] differentiate two concepts: interpretability and explainability. In the first case, we have a simple structure and it can be used to interpret or explain how the method works (e.g. linear model, decision trees, association rules). It assumes that the used algorithms or methods are transparent and does not need any explanation. It can be also called model-based explainability, or explainability by design [24]. The category of post-hoc explanations tries to explain how a black box (an algorithm or a method) works based on the final results [38]. It is crucial for such models that are non-linear: ensemble methods or neural networks (e.g. CNN, RNN [2]).

Arrieta et al. [3] and Ding et al. [8] have defined more aspects of explainability: understandability – user can understand how the algorithm works without any additional explanation about the internal structure; comprehensibility – the result of the learning algorithm should be understandable for human, it is also connected with the model complexity; transparency – the model by itself is understandable.

Another division of XAI models is into global and local explanation [38]. The global one aims to explain how the input variables influence the model. The local explanation focuses on how each feature influences the result (e.g. SHAP algorithm [20]).

Dazeley et al. [7] claim that full XAI system should implement two processes: social and cognitive. The first process should take into account interactions with other actors like people, animals, other agents, etc. The cognitive process should identify general causes and counterfactuals [11].

The authors have proposed the following levels of explanations according to the factors of user beliefs and motivations [7]:

  • Reactive: it is an explanation of an agent’s reaction to immediately perceived inputs – like instinctive behaviour of animals in dangerous situation.

  • Disposition: it is an explanation of an agent’s underlying internal disposition towards the environment and other actors that motivated a particular decision – the agent’s decision is based on its beliefs or desires.

  • Social: it is an explanation of a decision based on an awareness or belief of its own or other actors’ mental states.

  • Cultural: it is an explanation of a decision made by the agent based on what it has determined is expected of it culturally, separate from its primary objective, by other actors.

  • Reflective: it is an explanation detailing the process and factors that were used to generate, infer or select an explanation.

The first four levels are object-level explanations based on decisions or arguments and the last meta-explanation is based on the scenario structure or historical decisions or justifications.

In the literature one can find many methods for XAI but the majority of them can be classified to the lower levels: reactive, disposition or social.

In the next part of this section we present the most popular approaches to XAI. The methods in the group of visualization are based on visual form of explanation, like highlighted text in natural language processing [23] or explicit visualization of the results according some subsets of features [33]. The post-hoc explanations’ aim is to find feature relevance, model simplification, text explanation or explanation by example [3]. In many cases the post-hoc methods also use visualization approaches.

Visualization. Nowadays, there is more and more methods to train the model but it is hard to explain why we obtained any specific final results, what was an impact of particular set of features or cases during the training process and how they have influenced the final prediction mechanism [33].

Visualization approach allows us to take a look inside the data in a simpler way than using analytical methods. It can provide us with some intuitions about data distribution or differences between some subsets of cases.

So et al. [33] claim that the basics of explanation is the set of features that can be visualized. They differentiate the following aspects:

  • feature importance – it calculates how the feature of all observations impacts the prediction. The most popular method is SHAP (SHapley Additive exPlanations) [20] or counterfactual explanations [11];

  • additive variable attributions – it estimates which instances of the dataset are outliers;

  • what-if analysis – one can use ceteris-paribus plot to analyze a relationship between features and response.

One of the most effective algorithms for sentiment analysis uses CNN architecture. Souza et al. [36] proposed five different PIV (particle image velocimetry) techniques to visualize the flow of the method. They are as follows: guided backpropagation (GBP), saliency (SAL), integrated gradients (IGR), input \(\times \) gradients (IXG) and DeepLIFT (DLF).

Post-hoc XAI Methods. An input for a post-hoc XAI methods is a trained model. An expected output of the method is an approximate model that explains how the original model works [24]. It can also reflect decision logic or generate some representation of the model that is understandable, e.g. set of rules, feature importance score or heatmaps.

Most of the XAI methods dedicated for the text processing are model-specific approaches [3].

Some exemplary methods are described below [24]:

  • LIME (Local Interpretable Model-agnostic Explanations) – the algorithm introduces some perturbations to real samples and provides observations about the output of the model;

  • If-then rules – they should reflect the dependencies between the features. The generated rules should represent the original black-box model; determining the optimal set of rules is an optimization task.

The results obtained from post-hoc XAI methods that have found some dependencies between features, can be used for the feature selection methods. The main aim of feature selection methods is to reduce the dimensionality of the dataset and the complexity of the solution. It is possible because a lot of data is redundant [21, 32]. The task is to delete (or omit) some data as it does not significantly change the result of the algorithm.

The methods and techniques of Explainable Artificial Intelligence presented above focus on the feature – how a particular feature influences the result. They take care about the form of explanation, use a subset of features to obtain the result and they are separated from the model [11].

These methods also have disadvantages: they cannot show us, e.g. what is a minimal set of samples or instances that guarantees the obtained results [41] and using these methods, is not clear which input instance has determined the final result.

3 Explainable AI in Sentiment Analysis

Sentiment analysis is an area where transparency is a crucial feature of the user’s trust in the system [2]. Before a user makes a purchase decision or decides to use the service, he or she may decide to check the opinion about the topic, product or delivered service, etc.

Explainable artificial intelligence techniques allow us to better understand prediction of the model [12]. More and more methods and models in this area are predictive – to increase user’s confidence in the system, it should provide transparent and trustworthy results. As the authors claim, more effective algorithms mean less transparency.

The main objective of the XAI methods in sentiment analysis area is to answer the query: “How can XAI methods reveal potential bias in trained machine learning models for the prediction of product ratings?” [34].

In this section we present a classification of the existing solution for XAI in sentiment analysis domain. Most commonly used methods focus on the following aspects [34]:

  • Feature importance – it approximates the global relevance of the feature in the model. It depends on the model, e.g. for models based on trees it can split the tree and for linear models it is correlated with regression coefficient.

  • Local attributions – this approach allows to visualize the impact of a single feature’s variance as it can be missed by the analysis of global feature importance.

  • Partial dependency plot – it presents how each feature or several features can impact the final result.

Above mentioned methods are based on the visualization of the results. They can be used both to explain how the model works and to improve it: a feature can be not used in the model when it is not important, it has too high variance or it has weak relationship with other attributes.

The improvement of the interpretability or explainability can be achieved mostly by high transparency of the model that can be developed from structure of the network, feature importance, local gradient information, redistribution the function’s value on the input variables, specific propagation rules for neural networks [2].

In Table 1 we summarize existing papers focused on XAI in SA. Each paper is analyzed according to the main problem, feature and techniques used for sentiment analysis and type of explainability.

All these papers developed models for predicting sentence polarity. Most of them work on text reviews of documents or movies or simply tweets using a wide range of possible models of the data and methods for sentiment analysis: naive Bayes, decision trees, random forests, LSTM, softmax attention, neural networks (CNN, RNN, etc.).

The most popular approach to provide explanations of the results is a visualization method: SHAP, BertViz [12], LIME [12, 35], feature importance [19, 26, 40], local feature attributions and partial dependency plots [34], contextual importance and utility [30].

Table 1. Summary of the XAI methods in SA problems.

4 Trends and Challenges

The area of XAI methods is more and more developed to ensure more transparent and confident results that user can trust them. There are still many aspects that should be taken into account.

The challenges of XAI methods in sentiment analysis are correlated with development of new methods for SA, especially deep neural network approaches. As they become more and more popular and are used by wider and wider group of people (sometimes they even use them without thought or awareness how they work), it is important to take care about the responsibility of the results. Arrieta et al. [3] highlighted the need of preparing and using a set of principles that should be satisfied. They called this trend as responsible AI – it should include the following issues: fairness, privacy, accountability, ethics, transparency, security and safety.

XAI algorithms used in presented papers focus on visualization approaches. It allows us to see the impact of a feature or set of features on the final result. It increases the transparency and it can help to reduce the dimensionality of the problem.

There appears more and more sophisticated algorithms that take into account more information and obtain more accurate results. Unfortunately, they do not focus on the interpretability.

Most of responsible AI aspects are still not introduced to SA methods. The users would like to have trustworthy methods for analyzing opinion mining so explainable sentiment analysis is a promising investigation area. Due to the wide variety of the SA methods, better explainable algorithms should be also created.

5 Summary

In this paper we have presented the explainable artificial intelligence methods that are used in sentiment analysis. We have described definitions and an overview of the existing methodologies used in SA. The second part focuses on explainable methods that are more and more popular in general area of artificial intelligence. And finally, we presented exemplary research articles that use XAI methods in the opinion mining.

Most of presented paper uses only visualization methods to help the user to interpret the result so it is still a potential research domain.