Keywords

1 Introduction

Sentiment analysis, also known as opinion mining [1], is an area of information processing that has been successfully applied in domains such as medicine. For example, there are several works that use opinion mining to analyze the emotional reaction of patients regarding different aspects of diabetes [2] and asthma [3]. The systematic review of the literature presented in this paper focuses on the use of sentiment analysis in the education domain.

Data recovery techniques [4] mainly focus on processing, searching and extracting factual information from digital education resources or learning environments [5], such as blogs, forums, and social networks. The data have an objective and subjective perspectives. On the one hand, the objective perspective is not influenced by emotions, opinions, or personal feelings, i.e., it is a perspective based in fact, in things quantifiable and measurable. On the other hand, the subjective perspective is one open to greater interpretation based on personal feelings, emotions, aesthetics, etc. Sentiment analysis focuses on analyzing the subjective perspective of data.

The sentiment analysis process is divided into four core phases: data acquisition, data preparation, review analysis, and sentiment classification. There are two main sentiment analysis approaches, namely: (1) machine learning, which is divided into supervised and unsupervised machine learning approaches, and (2) lexicon-based approach, which is divided into two categories dictionary-based and corpus-based approaches [7].

Supervised machine learning uses techniques or algorithms such as Naive Bayes [8], which is the simplest and most used classifier that calculates the posterior probability of a class based on the distribution of words in a document. This algorithm uses the Bayes Theorem to calculate the probability of a word belongs to a particular tag. SVM (Support Vector Machine) classifiers are also used in sentiment analysis. SVM are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. On the other hand, neural networks are also used in sentiment analysis [10]. The learning process of neural networks requires a large corpus with positive, negative and neutral opinions collected from data sources such as social networks or forums. Once the training phase is completed, the network will be able to classify a new opinion as positive, negative or neutral. Finally, ME (Maximum Entropy) technique (ME) [11] calculates the probability that a text belongs to a category. To carry out this process, this technique should maximize the entropy in order to avoid introducing a bias in the system. Unlike NB, this method does not assume independence between features or terms.

The lexicon-based approach classifies a text according to the positive, negative and neutral words contained in it. This approach does not require a training phase. As was mentioned earlier, the lexicon-based approach can be divided into two categories: dictionary-based and corpus-based approaches. On the one hand, the corpus-based approach tries to find co-occurring word patterns to determine the polarity of a text. On the other hand, the dictionary-based approach uses synonyms, antonyms, and hierarchies that are found within the lexical database. The lexicon-based approach uses techniques such as specialized vocabularies [12] and dictionary construction techniques. For instance, in [13], the authors propose an emotional dictionary for sentiment analysis applied to online news. Another example of dictionary construction is presented in [14], where authors propose a dictionary for sentiment analysis based on common-sense knowledge.

E-learning is the delivery of education through digital or electronic methods allowing students to acquire new knowledge and develop new skills. E-learning allows students to expand their knowledge whenever and wherever. Kechaou [15] considers the sentiment analysis as an alternative to improve the learning process in an e-learning environment since it allows analyzing the opinions of the students in order to better understand their opinion and take more effective, better-targeted actions. Hence, it is important to analyze the use of sentiment analysis in the education domain. Despite there are currently several works that present literature reviews of sentiment analysis, there is still no proposal that presents a systematic literature review of sentiment analysis in education domain.

The remainder of this work is structured as follows: Sect. 2 presents the research methodology followed in this literature review. Section 3 describes the systematic review execution, while, Sect. 4 presents our results. Finally, our conclusions are presented in Sect. 5.

2 Systematic Review Planning

The literature review presented in this work has three main objectives: (1) to identify the techniques and classification algorithms used by sentiment analysis in education domain; (2) to identify digital educational resources or learning environments that serve as data sources for the sentiment analysis; and (3) to identify the most used techniques and data sources by the sentiment analysis in education domain.

2.1 Research Questions

For the purposes of this literature review, three research questions were defined to guide us throughout the research and help us to meet the established objectives. The research questions are listed below:

  • RQ1. What is the sentiment analysis process?

  • RQ2. What approaches and digital educational resources are used in sentiment analysis?

  • RQ3. What are the main benefits of using sentiment analysis on education domain?

2.2 Digital Libraries

Table 1 shows the digital libraries that were used to perform the systematic literature review. Also, this table presents the type of bibliographic source, language, the period of publication, and search strategy used in this work. As can be observed, a keyword-based search strategy was used to search for research works focused on sentiment analysis in education domain. This strategy is described in detail in the next section.

Table 1. Digital libraries.

2.3 Search Strategy

To answer the research questions, we use a keyword-based search strategy. For this purpose, we identified a set of keywords related to sentiment analysis in education domain as well as synonyms for the set of keywords identified. Once these terms were defined, we combined these terms with the connectors “AND” and “OR”, resulting in the following search chain:

(sentiment analysis) AND (sentiment classification OR sentiment analysis techniques OR opinion mining OR education domain) AND/OR (digital educational resource) AND/OR (students) AND/OR (university)

Finally, it should be mentioned that only the works published in the 2013–2018 period were considered in this work, such as was specified in Table 1.

2.4 Exclusion Criteria

We discarded those papers that were not directly related with sentiment analysis and education domain. Also, we use next exclusion criteria:

  • Research works not written in English.

  • Master and doctoral dissertations.

  • Duplicated research works obtained from Google Scholar and Web of Science.

3 Systematic Review Execution

This section presents the systematic review execution which consisted in searching for research works relates to sentiment analysis and education domain in the digital libraries selected and evaluating the obtained studies considering the inclusion and exclusion criteria. Also, this review allowed responding to the research questions presented in Sect. 2.1. These responses are discussed in next sections.

3.1 RQ1. What is the Sentiment Analysis Process?”

Sentiments

Sentiments are attitudes, thoughts or judgments triggered by sensations or mental processes. Sentiments are defined according to the experiences of each person and are generated in the subconscious. Also, sentiments are durable and recurrent since they remain in the emotional memory [16]. Sentiment analysis aims to assign a sentiment polarity to a text, in this case to texts generated by students. Sentiment polarity indicates whether the message has a positive, negative or neutral sentiment [17]. Sentiment analysis can be performed at three levels: document, sentence, and entity level.

Sentiment Analysis Process

Figure 1 shows the sentiment analysis process which is divided into four main phases: data acquisition, data preparation, review analysis, and sentiment classification. These phases are described below:

Fig. 1.
figure 1

Sentiment analysis process.

  • Data acquisition can implement data mining techniques used in education domain [18] since data can be extracted from digital educational resources such as forums of MOOCs.

  • Data preparation phase, also known as data preprocessing [19], is a necessary step for sentiment classification [20]. This phase consists of cleaning and preparing the text for classification. For instance, online texts contain usually lots of noise and uninformative parts such as HTML tags, scripts, and advertisements. In addition, on words level, many words in the text do not have an impact on the general orientation of it.

  • Review analysis phase analyzes the linguistic features of reviews so that interesting information can be identified. This phase aims also to select the words that will be used in the last phase of sentiment analysis process.

  • Sentiment classification phase classifies a new opinion as positive, negative or neutral based by implementing the machine learning, lexicon-based or hybrid approaches.

3.2 RQ2. What Approaches and Digital Educational Resources are Used in Sentiment Analysis?

Figure 2 shows the sentiment analysis approaches used in education domain according to the literature review performed. There are two main sentiment analysis approaches used in this domain: the machine learning and lexicon-based approaches. On the one hand, machine learning approach can be divided into supervised and unsupervised machine learning approaches. Regarding supervised machine learning approach, there are several classifiers used in education domain such as decision tree, linear, rule-based, and probabilistic classifiers. On the other hand, the lexicon-based approach uses techniques such as dictionary-based and corpus-based approaches.

Fig. 2.
figure 2

Sentiment analysis approaches.

Table 2 shows the works analyzed in this literature review. This table presents the year of publication, sentiment analysis approach, classifier, and techniques used by the authors, the sentiment analysis level (document, sentence, and entity) adopted, and the precision achieved by the sentiment analysis process proposed by authors.

Table 2. Approaches, techniques and levels of sentiment analysis.

Japtap [22], who employed different techniques for sentiment analysis at the sentence level, concluded that it is not reliable to determine the sentiment of a user based on a brilliant or boring sentence. In this sense, the author analyzed the sentiment analysis techniques and established that each technique has a percentage of accuracy when the sentiment of a person is determined.

Table 3 present a set of the works analyzed in this literature review. This table aims to identify what are the digital educational resources most used for sentiment analysis in education domain. As can be seen, the most used resources are the forums of MOOCs followed by social networks such as Facebook and Twitter.

Table 3. Education resources used for sentiment analysis.

3.3 RQ3. What are the Main Benefits of Using Sentiment Analysis on Education Domain?

Sentiment analysis in education domain [45] goes beyond just knowing what the students’ sentiments are. Table 4 presents the benefits that can be provided by sentiment analysis to education domain. Some of these benefits are learning process improvement, performance improvement, reduction in course abandonment, teaching process improvement, and satisfaction with a course. Furthermore, Analytical learning refers to the collection and analysis of students’ information and their context aiming to understand and optimize the learning process and the environment in which it occurs. This information is especially important for e-learning systems, which guide students through the learning process according to their particular needs and preferences. Hence, this information is also important for teachers since it allows them to know the emotional state of their students.

Table 4. Benefits that can be provided by sentiment analysis to education domain.

4 Results

Table 5 shows that forums of MOOCs [46] are the most used resources for sentiment analysis in education domain. In other words, in the education domain, datasets and lexicons are mainly built from forums of MOOCs. These data are provided as input to the sentiment analysis system to allow it to classify a new opinion.

Table 5. Digital educational resources used in sentiment analysis on education context.

Table 6 shows the most used techniques for sentiment analysis in education domain. These techniques are grouped according to the sentiment analysis approach to which they belong. According to the literature review presented in this work, the most used technique under the supervised machine learning approach is Naive Bayes, which commonly provides higher precision than other techniques used under this approach. Regarding lexicon-based approach, dictionary-based techniques are the most used for sentiment analysis. Finally, Table 6 also reflects that machine learning and lexicon-based approaches can be used in conjunction to perform the sentiment analysis process.

Table 6. Sentiment analysis approaches.

5 Conclusions and Future Work

The systematic literature review presented in this work revealed that there are several works that use sentiment analysis to improve different aspects of education domain such as learning process, students’ performance, reduction in course abandonment, teaching process, and satisfaction with a course. This review also revealed that forums of MOOCs and social networks such as Facebook and Twitter are the most used digital education resources to collect data needed to perform the sentiment analysis process. Other educational resources used in sentiment analysis are learning journals, computer texts, software programs, lexical databases, and other electronic documents. Regarding sentiment analysis techniques, Support Vector Machine (SVM) and Naive Bayes are the most used techniques. Finally, we note that there is a trend to combine both machine learning approach and lexicon-based approach to perform the sentiment analysis process.

As future work, we plan to extend this literature review by including a wider set of digital libraries such as the Wiley Online Library. Furthermore, we plan to establish more research questions that help domain experts to obtain a better perspective on the use of sentiment analysis in education domain. This information could help experts to propose solutions that address challenges and limitations in education domain.