Keywords

1 Introduction

The emotional well-being of students holds significant relevance in the educational context. Recognizing and understanding the emotions experienced by students is not only crucial for their personal development but can also have a direct impact on their academic performance and overall success in the educational environment [2]. However, the task of detecting and assessing students’ emotions can be complex and subjective for educators and education professionals. In response to this challenge, image processing supported by artificial intelligence approaches offers a promising tool for objectively and accurately analyzing students’ emotions [5]. This has already been tested in virtual education processes, where such analysis can even occur in real-time, enabling early interventions in the educational methodologies and resources employed.

Image processing enables the extraction of key visual features from an image, such as facial expressions, gestures, and body postures, which are closely related to human emotions. By combining computer vision and machine learning techniques, it is possible to develop algorithms capable of identifying and classifying the emotions expressed by students in various educational situations and contexts [1].

This work focuses on exploring the possibility of classifying and analyzing emotions in university students based on images extracted from recordings of in-person classes stored on an educational support platform. It is worth noting that this work represents a preliminary approach to building an emotion analysis model, making it an experimental effort involving classification techniques. At this stage, it does not yet propose a fully implementable tool in the classroom. Additionally, the advantages of transfer learning [13] are examined as a means of domain adaptation, using a validation dataset separate from the model’s training dataset. The goal is to fine-tune the pre-trained model to the target data by transferring prior knowledge and adapting to the new data. However, it is important to consider certain factors, such as image quality and the interpretation of emotions, which can be influenced by factors like lighting or the presence of facial obstructions. External factors, such as students’ mood or personal situations, may also affect facial emotional expression [8].

Throughout the article, various studies and practical applications that have used image processing to analyze students’ emotions in educational settings will be presented in Sect. 2. Section 3 will cover the methodology, including data selection and extraction, as well as the experiment setup. The results and their discussion will be presented in Sect. 4, while the conclusions and future work will conclude the document in Sect. 5.

2 Literature Review

There are several studies that have proven the benefits of image processing for the detection of emotions in educational environments. The state of mind can influence the learning process [7], so having knowledge of the emotions that a student is experiencing in a class can help the teacher to propose methodological strategies and didactic mediations.

In [4], a significant experimental benchmark for research on students’ emotion recognition and graphical visualization of facial expressions in a virtual learning environment is successfully proposed. This paper presents an exploration of speech and images by comparing the performance of several deep learning neural network algorithms and an improved long term bidirectional memory convolution neural network algorithm is proposed, which achieves satisfactory performance for the addressed case study.

On the other hand, in [10] a framework that combines a facial expression recognition (FER) algorithm with online course platforms is proposed. Students’ faces pictures are taken through the cameras of the devices they use to attend classes and the expressions are analyzed and classified into 8 types of emotions (anger, contempt, disgust, fear, happiness, neutrality, sadness, and surprise). The authors used a course with 27 students conducted on the Tencent Meeting platform and the results obtained show that the model based on Convolutional Neural Networks (CNN), demonstrates robustness in various environments. This suggests that facial expression recognition could be an effective tool for understanding students’ emotions during online classes.

As for [9], it reports the analysis of behavior and the search for patterns in the oral presentations of a group of students by applying sequential pattern mining techniques. The analysis allowed segmenting into three different groups according to their body postures, sequential pattern mining provided a complementary perspective for data evaluation and helped to observe the most frequent postural sequences of the students.

Another interesting work is presented in [6], which proposes the integration of two models, one for emotion recognition and the other for attention analysis, to facilitate monitoring during a student’s interaction in virtual environments. This integration was carried out on a web platform and the results indicate that the platform could be used by teachers as knowledge mediators, since they could understand the behavior of students in virtual environments, whether synchronous or asynchronous, and take actions to improve the learning experience of students.

In relation to transfer learning, the study [3] proposes a two-phase approach to develop an emotion recognition model and face the challenge of data scarcity in this field. Experimental results evidence a significant improvement in performance when applying the transfer learning strategy through implementation in a Convolutional Neural Network (CNN). This resulted in a remarkable increase in recognition efficiency from 86.38% to 95,89%.

Other works, such as those presented in [12] and [11] also recognize the usefulness and benefits of knowledge transfer in the representation of facial expressions and emotion recognition, being in a process of consolidation of this approach, as well as the libraries, algorithm implementations and tools that support it.

A review of previous studies highlights the importance of understanding students’ emotions in educational environments. Existing approaches have demonstrated efficacy in emotion recognition through image processing and facial expression analysis. However, they present limitations in terms of generalization, personalization, and data availability. The approach proposed in this manuscript seeks to overcome these limitations by improving performance and adaptability in a technology-assisted face-to-face educational environment.

3 Methodology

A data mining process was carried out following the traditional steps of a KDD (Knowledge Discovery in Databases) methodology, in which experiments with different hyperparameter configurations and model generation from classifiers such as KNN, CCN and Random Forest were proposed, contrasting the performance of each one based on accuracy, recall and F1 metrics.

By using these classifiers, we seek to perform a comparative analysis to determine whether the Convolutional Neural Network (CNN), being specifically designed for image and video processing, could perform greater efficiency in comparison to the other evaluated models.

Additionally, an approach to a learning transfer process was made by using an open-use dataset for emotion classification to train the models and validating with the data extracted from the case study source.

An applied research is developed with a case study corresponding to the analysis of images obtained in a subject of the Systems Engineering program of the University of Medellin in the semester 2022-2, which includes videos captured in the classroom, from which a relevant set of images were extracted for our research. These videos were obtained from the cameras installed in the classrooms, thus guaranteeing the quality of the data. Regarding data privacy, it is important to note that these classes are uploaded to the u-virtual platform, and due to a previously established agreement at the time of enrollment at the university, they are accessible to the university community. Two strategies were proposed for data collection, one for model training data, which were obtained from a free external repository, and one for test data, which were collected from the group described above as a case study.

Figure 1 shows the general scheme of the steps carried out in the process, where the flow between the data obtained for the training of the models, the collection of the test data, the training process that generates results and the pre-trained model that is subsequently used to perform the transfer with the test data extracted from the real case mentioned above is identified.

Fig. 1.
figure 1

Scheme of the process carried out.

4 Results

The results at the end of the application of the techniques show that the pre-trained models from a generic dataset of emotions identified in facial expressions demonstrate a medium level of effectiveness in identifying the predominant emotion in students from images of moments extracted from recordings of in-person classes. Together, these results support the utility and potential of applying these techniques in scenarios where image-based emotional analysis is required. Below are the performance metrics obtained for the three implemented models (see Table 1).

Table 1. Evaluation metrics.

On the other hand, the ROC curve is shown in Fig. 2, which illustrates the relationship between the true positive rate and the false positive rate. These curves allow an effective evaluation and comparison of the performance of the models in classification problems. Likewise, the value of the area under the ROC curve will be highlighted, a metric that quantifies the quality of the models in terms of their classification capacity.

In the implementation of the three classification algorithms, it was observed that the CNN model demonstrated superior performance compared to the Random Forest and KNN algorithms. Likewise, after applying Transfer Learning, an improvement in the efficiency of the CNN model was evident. On the other hand, the KNN algorithm also experienced an increase in its efficiency but remained below the CNN. These results suggest that the use of Transfer Learning can enhance the performance of classification models for the processing of emotional images.

Fig. 2.
figure 2

ROC curve.

These results support the usefulness of the applied techniques and their potential for various applications. For example, in educational settings, the developed models can be used to identify learning situations that generate positive or negative emotions in students, allowing timely intervention to improve the learning experience.

When performing a comparative analysis of the metrics presented in the table (see Table 1) of models, it is clearly highlighted that the CNN model exhibits the greatest efficiency, even when applying transfer learning, reaffirming its position as the most suitable model for the case study. These results coincide with the evidence found in previous studies [8] and [11], where it was also shown that a Convolutional Neural Network (CNN) obtained superior results in efficiency. However, it is important to highlight that, despite these achievements, we must continue to focus on further improving the efficiency of our case study, considering that the results reported in the literature review exceed 85%.

5 Conclusions

In the processes of classifying educational data, aspects such as the observation of features, that is, exploratory analysis, and the verification of transformation needs are fundamental, as well as the verification of data balance. For this work, the interpretation of emotions present in an image can be subjective and vary among different observers. In image analysis, the consistency and reliability of results can be affected, in addition to the aforementioned factors, by other aspects such as the selection of the partitioning method for the training stage, the definition and execution of experiments, and the strategy for evaluating the results.

The use of transfer learning in image analysis yields favorable results, proving to be an effective way to enhance model performance, especially in situations where there is a limited dataset available or rapid adaptation to new tasks or applications is required. For the experiments outlined in this work, where a comparison of various classification techniques was carried out, it is found that transfer learning is a viable option for training models in problems where obtaining a significant dataset is challenging.

As a future work, the goal is to expand the test dataset taken from the virtual learning environment that supports in-person teaching processes. Additionally, more detailed tracking of a specific subject, along with the visual records, is planned, and these records may be accompanied by sociodemographic characterization data of the student group and the didactic activities carried out at the time of video and image capture. Furthermore, the intention is to incorporate other relevant factors, such as precise emotion interpretation, to achieve more efficient results in the analysis.