1 Introduction

Human behavior recognition and action understanding is a complicated task as it involves diverse actions, challenges of capturing appropriate information from videos, and external factors like light conditions, angle, human appearance, etc. Facial expressions hold a central part in human behavior recognition. Information is perceived through verbal and non-verbal channels, but facial expression is perceived through the non-verbal medium. Facial expression recognition is the technique to detect such expressions and classify them into anger, fear, surprise, sadness, happiness, etc. The final results of the behavior of a person are based on these expressions [1]. The ability to perceive the human’s emotional state is critical in recognizing human competence [2]. Today, the learning community focuses on student work and the vision of faculty who are working collaboratively to obtain high learning quality. Hence, digital communication, which leads to learning communication, gives learning a new dimension. "Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning" [3].

Education can be supervised, semi-supervised, or unsupervised. As per Hasani’s research [4], emotions offer a significant layer of information that can be used for emotional detection in the event of synthetic features’ failure. Teachers use facial expressions and technology to assess students’ level of understanding, with learning facial expressions playing a crucial role in this process. To detect facial expressions, various deep learning algorithms have been employed, including deep learning networks (DLN) and convolutional neural networks (CNN) [5]. Other studies have also indicated that CNN is one of the most effective methods for emotion detection and is widely utilized [4].In addition, [6] has focused on determining the best model for emotion detection, while [7] has elaborated on the neural network’s role in emotion detection.

CNN is a multi-layer perceptron (MLP) that is specialized in capturing the relationship between pixels from a particular perspective. Its architecture comprises a different number of layers that are tailored for image processing tasks. These layers are responsible for performing segmentation, feature extraction, and classification, requiring minimal preprocessing functions on the input image [8]. The initial CNN architecture for emotion detection was designed for frontal images with facial expressions and was relatively straightforward. Various models have been developed for deep learning architectures to detect emotions, and each model is trained using defined datasets with early specified data. These deep learning architectures, including CNN, have different models, and they may be used in combination with other models for emotion detection. However, all models are trained using defined datasets with early specified data [7].

The ’OS’ module in Python provides functions that return the number of images for each emotion type in a dataset. To gain a deeper understanding of the dataset and the types of images it contains, the ’utils’ module in Python can be used to plot a few example images. Previous research has indicated that the average achievable accuracy of a training model developed using the FER-2013 dataset is 66.7%. Our goal is to design a CNN model with similar or better accuracy, as stated in various research papers [7].

For this study, we utilized the JavaScript Face API, an open library that is accessible on GitHub. This library includes various models, such as face landmark detection, gender recognition, face detection, age estimation, face recognition, and emotion recognition. In this project, we employed three of these models: face detection, emotion recognition, and face expression recognition model. Specifically, for facial expression detection, we implemented Face-api.js using SSD MobileNet v1, which is based on CNN, as described in Sawyer’s research [9]. We make the following contributions

  • A real-time emotion recognition framework is designed to capture and demonstrate students’ emotions while they are using online learning platforms. This real-time aspect is particularly valuable as it allows for immediate feedback and analysis of emotions during the learning process, providing new insights into students’ emotional responses to online education.

  • The proposed model is implemented using SSD MobileNet based on WB NetV1. This choice of model architecture and its application to emotion recognition is a novel approach that may offer advantages in terms of efficiency and accuracy compared to traditional models.

  • An open CK+ and JAFFE datasets of facial images are utilized to evaluate the performance of the proposed model.

  • The superiority of the proposed model is shown by comparing its performance with previous state-of-the-art approaches. This comparative analysis validates the effectiveness of the model and establishes it as a cutting-edge solution in the field of emotion recognition for online learning.

This research work is structured as follows: Section 2 presents previous work and research conducted in related domains. Section 3 describes the data description and preprocessing steps utilized in the experimental and implementation phases. Section 4 outlines the proposed methodology. Section 5 presents the experimental results and a scientific evaluation of the study. Finally, Section 6 concludes the proposed study.

2 Related work

Today, the achievement of digital communication leads students into a new dimension. The scope of virtual schools is increasing worldwide, and computer education is considered necessary for the future [7]. In [10, 11], the authors perform experiments for facial expressions to recognize expressions. They conduct research and do various experiments for virtual learning that use various tools. In this study, we review existing works on emotion recognition [10, 11].

Ekman believes that most facial expressions are blended into several feelings [12]. These are the most challenging things by far to recognize non-verbal expressions and understand them clearly.However, we can use emotional expression as a signal such as wrinkling an eye is considered a gesture of approval. Sticking the tongue out is deemed to be a signal the playful distaste. In [13], Diaz stated that emotional expression is the least controversial in the communication of all non-verbal channels. Hence, emotional expression is the most experimental group of gestures. Therefore, one needs to be more focused on the face than the other parts of the body, as its meaning is widely accepted. The expressions on the face can say a lot, like expressing shock and surprise [13].

Emotional expressions result from muscles and motions. Hence, movement conveys lots of emotions for the individuals, so emotional expressions are considered non-verbal communication with the primary meaning of communicating social information to a human. Diaz states that while considering how many muscles in the human face move, he is surprised to see a wide range of emotional expressions produced [13]. While teaching students, teachers make effective use of these emotions. [14, 15] describes the attention-based modeling for emotion detection by detecting emotion contrasts to understand students’ interests. So teachers can easily recognize whether students are satisfied with their teaching method or need more attention [14, 15]. In this way, the proper use of emotion detection makes the teacher understand the students while teaching in the classroom, which eventually leads to better learning outcomes [16].

Virtual learning uses audio, videos, Word, slides, and PDF to simulate the learning and learning environment as closely as possible. For the virtual environment, the plethora used is a pedagogical purpose like distance learning. Worldwide, the benefits of virtual schools are increasing day by day mediated by a computer considered more critical for the future. Some say that virtual education shows instruction in the learning environment where a teacher can interact with a student separated by time, distance, and space. The teacher provides a course management application through multimedia, the internet, conferencing, and student communication with the teacher using technologies [17]. Learning from virtual books, students learn through devices using laptops, mobile phones, iPods, etc. The difference between physical book learning and online learning through soft copies provides apparent flexibility and adaptability. Students can get more information through the internet at the same time by sitting in a single place without going anywhere to search for related books or something else [18].

Learning in a real classroom with face-to-face communication, students can see each other and recognize the expressions. Accordingly, many virtual classes have implemented regular online courses and chat with classmates through video conferring interaction. Virtual business is considered analog, so reading is referred to as virtual communication used for learning and teaching in a virtual environment. Students consciously and unconsciously receive and send facial clues several times per day during lectures, classrooms, and online reading. Thus, reading through online learning in classrooms has two primary reasons: to receive student messages better and get the ability to send a positive signal that supports learning more skillfully. So, it is essential to use non-verbal communication and detect nonverbal expressions for better online learning [9].

Interactions with one another, whether student to student or student to teacher, play a vital role in the class environment, so the impact of communication through facial expression is powerful [9]. Those facial signs are rich in expressions representing lots of information about individual identity, mental state, and mood. Several studies reveal that facial expressions are the most prominent and expressive way to display emotions [19]. Facial expressions are the primary source for detecting the information next to words and determining internal human feelings. All people and students use facial expressions during lectures and get one another’s impressions [20]. A study reveals that students can stay motivated and interested while taking lessons or learning online through facial expressions [21]. Thus, a lecturer giving a lecture can also use students’ facial expressions as a most valuable source of useful feedback, whether students understand the course. According to Sathik, a lecturer can detect the students ’ facial expressions and recognize whether they have to slow down, speed up, or implement another way to improve or change the presentation [22]. To optimize the student’s behavior in class while learning, the basic strategy is that the teachers can feel the student’s mind, so they must be a reasonable observer of the student’s expression, movement, and actions. This strategy helps to understand the student’s strengths and weaknesses to adapt any changes to suit their learning [23].

A facial expression that shows emotions includes muscle movements like wrinkling, eyebrow-raising, curling lips, and rolling eyes. According to medical studies, when students feel uncomfortable, they lower their eyebrows, shrink their brows, wrinkles on the forehead in vertical and horizontal directions, and take time to maintain eye contact [19]. To detect the student’s expression correctly, humans have to be familiar with lots of subtle nonverbal clues that most students exhibit. Studies evaluate that the emotional state of a student is expressed by the specific human behavior which can be detected automatically, so grabbing it through the forehead, mouth, eyes, and nose plays an essential role in recognizing the facial expression through an automated system. Automated systems can extract the expression through the underlying emotional state and categorize the spontaneous facial expression [19]. Since the discussed research works report different approaches and results, a critical summary of these works is presented in Table 1.

In recent years, computer vision has seen remarkable advancements, and numerous research works have leveraged its potential to analyze and understand human emotions. Several state-of-the-art research studies involve the application of computer vision using libraries like OpenCV, Dlib, TensorFlow, or PyTorch to detect and recognize facial expressions, gestures, and emotional cues. While most state-of-the-art audio-visual fusion methods use recurrent networks or conventional attention mechanisms, they often fail to effectively leverage the complementary nature of these modalities. To address this, the paper [24] proposes a joint cross-attention fusion model that efficiently exploits the inter-modal relationships between facial and vocal modalities extracted from videos. Another paper [25] focuses on deep learning techniques, particularly convolutional neural networks, for facial expression recognition tasks. It proposes an occluded expression recognition model that comprises two modules: occluded face image restoration and face recognition.

Table 1 Summary of research works on emotion detection

Table 2 presents the summary of current research works on education and emotions. By presenting a variety of research topics, methodologies, and findings, the summary table demonstrates the multidimensional importance of research works in education, as they contribute to a deeper understanding of various factors influencing teaching, learning, and academic outcomes. These studies collectively enrich the knowledge base and provide valuable insights for enhancing educational practices and student experiences.

The scope of facial expressions among students needs to be further investigated. Several researchers apply different methods for the correct identification of emotional detection. For example, the CNN deep learning model has been used widely for this task. However, it is found that the accuracy of such approaches is low and they lack real-time emotion detection. Real-time emotion detection is important for online learning platforms. Similarly, detecting a reader’s emotion while reading books online is not well investigated. Therefore, this study develops a JavaScript-employed web-based emotion detection application. JavaScript language was used because it is the most important language in building web development tasks, enterprise-class applications, and Android development. Recently the information technology industry has been growing rapidly, and the demand for JavaScript is growing, so for experiments, we considered JavaScript language more suitable to use. This research holds substantial importance in advancing emotion detection technology, enhancing online learning platforms, and providing valuable insights into learners’ emotional experiences. Its high accuracy, real-time feedback capabilities, and potential for educational interventions make it a valuable contribution to the field of education and technology.

Table 2 Summary table of the research works on education and emotion

3 Design science and research methodology

The systematic study of developing practical solutions is known as design science research (DSR) that emerges from real-life settings and solutions [38]. The DSR uses a rigorous process to design artifacts to solve practical problems evaluate designs and make the research more contributive and communicative. The framework of DSR is presented in Fig. 1.

Fig. 1
figure 1

Design science research framework

3.1 Problems with emotion detection system

Various technical solutions are observed to detect the human emotion used by firms to improve responsiveness, decide regarding the candidates during interviews, and optimize the emotional impact on the system by human and customer service improvement. Studies have shown that many solutions in this field rely on outdated psychological theories and cannot always be trusted [39]. While reading PDF books, teachers and students face many challenges in the learning activities due to ineffective achievement goals involving cognitive skills, low morale, motivation towards reading, and self-esteem.

3.2 Design and develop an artifact

By analyzing the evaluated designs and models, we know that the main functional requirement that can support the system to detect human emotion continuously while reading online text is compulsory for the teachers to understand the students ’ emotions. According to [40], communication advancement in the network and social media users use massive feelings for which the extraction of emotions is necessary and extracted frequently. Therefore, an analysis of the requirements for emotion detection can be very important and have diverse applications [40]. One requirement is that a device must continuously detect emotion because emotions are a continuous process. [41] emphasize that a system is required that can accurately capture emotions while people are reading and the human association with them. An adaptive system towards the lexical variations provides a fine-grained quantitative assessment to detect accurate human emotions [1]. Another requirement is detecting human feelings when they raise any question or have any problem while reading text from a PDF. When students learn, they generate some questions with irregular intervals of time. A system must detect the feelings and clues whenever students have any questions, so sentiment analysis is needed.

3.3 Demonstrate artifact

For this purpose, machine learning is needed to detect human emotions like aggressiveness, anger, sadness, happiness, surprise, or any questions. Another requirement found in the literature is using the CNN model with the use of images only, so deep learning is required to detect human emotions continuously and change feelings/emotions when they are reading any text [42]. In previous studies, we observed the vital role of machine learning in the interaction of emotions, representing a text with a sparse bag of words. A method is required that combines sparse and dense behavior of human representation [43].

3.4 Evaluate artifact

So, in this paper, a model for emotion detection is proposed with the use of JavaScript. We used the JavaScript library because it makes the functions and applications that can be internet indispensable coded in JavaScript form. According to the survey, it is found that 94.5% of software is using JavaScript, either for reading material or other. We also found from the literature that for emotion detection while reading text from PDF, JavaScript is not very well studied. Therefore, it is desirable to investigate the use of JavaScript for real-time emotion detection [44].

3.5 System architecture

The study conducted emotion recognition tests on students while they read a short article using the JavaScript face API. The videos are captured using Nikon D5300, a digital single-lens reflex (DSLR) camera with an 18/140 lens. The dimension of recording video is \(11903 \times 13096\) pixels (width \(\times \) height). An emotion detection algorithm captured students’ emotions after each paragraph with a 5-second delay. The face recognition application recorded emotions once users allowed camera access, securely storing the data. The system for real-time emotion recognition offered valuable insights into the emotional responses of students while engaging in online learning. To evaluate the proposed model, the researchers also employ open CK+ and JAFFE datasets of facial images. Using these datasets for emotion recognition in the context of online learning is a novel application, showcasing the model’s performance in realistic scenarios and ensuring transparency in the evaluation process.

The system architecture diagram for the emotion detection system is shown in Fig. 2. In this research, the target people are all types of learners, so the detection of emotions is done quickly while reading the online text. To cover every expression after every five seconds, we use a small face detector and a real-time face detector is much faster, smaller, and consumes fewer resources. Using Face API-JS, emotion detection is done in this research by following a model to recognize the face and its expressions. SSD, which is based on Web NetV1 is implemented to detect emotion after every 5 seconds. At each location, each face image will return its boundary boxes together with each face probability. This research aims to obtain very high-accuracy data for students in education systems who are reading online texts. We use this model to detect every facial expression every five seconds to get higher accuracy [16]. The model is small-sized, lightweight, and web-friendly so it should be the GO-TO face detector on limited resources and the web for clients.

Fig. 2
figure 2

System Architecture diagram for the emotion detection system

This model takes the space of 190kb only so the face detector can be trained with the same custom dataset of 14K images labeled with the boundary boxes. With oriented boundary boxes, the face’s full features can be detected so generally it produces better results with the combination of subsequent landmark detection. We know that lots of people have small beginnings and round or other shapes of the face so for the detection of such expressions the use of tiny face detection model is used so that each type of look can be detected. Their expression can be seen more precisely [16].

For the recognition of the face, an architecture known as ResNet-34 is implemented to compute face descriptor with the given image of a face. With this model’s use, one benefit is evaluated, which is the detection of two arbitrary faces through a comparison of face descriptors. For example, the Euclidean distance is computed with the use of a personal choice of the classifier. For face recognition.js, the use of face recognizer Net is done which is equivalent to the neural net and is used in Idlib face recognition.

For the next step, which is the detection of change in emotions, to implement these characteristics, the employment of a feature extraction layer as a regression layer and recognition layer is done. This model classifies the expressions based on their implementation. The model gains the size of 420kb only, so its feature and emotion detection category employ tinier architecture changes. This model’s performance is done with the use of some databases like UTK, Chalearn, Wiki, IMDB, CACD, MegaAge-Asian, and FGNET. From all above the main focus and functions of this architecture with input and output have the following points.

  • Recognize human emotions while reading online text

  • Highlight the face type

  • Give accurate results

Tracking human emotion while reading online text, is an essential target of this research. The second target is to recognize the changing expressions of the person reading online text so that recognition of emotion would be more comfortable. The third function is the detection of face type; for example, if a human has a small face, then recognition with Euclidean distance would be done. The last function is getting highly accurate results. The whole process is repeated every five seconds to achieve accuracy and detect every expression of a human [45].

4 Model implementation

The evaluated framework works on the online text file, read by the specific human in a web environment. In a web environment, the algorithm for the face emotion detection of the interest points for localization of feature extraction is performed. The whole architecture is implemented on a web platform [18]. The use case diagram for the emotion detection system is shown in Fig. 3

Fig. 3
figure 3

Use case diagram for Emotion Detection System

4.1 JavaScript face-API

With the use of Face-ap.js, the detection of emotions and recognition of face landmarks is implemented using CNN. Several modules are available on the face-api.js like face detection, recognition of facial emotions, etc.

4.2 Algorithm

The proposed approach aims at emotion detection while the reader interacts with online reading. We want to calculate the time spent on a single paragraph or a single paragraph with the online text page’s start and end time and capture the image to detect emotions.

Face API is used to capture the face’s appearance so that recognition of emotion would be more accurate.

When the user opens the particular text from the platform and starts reading, his appearance is captured and downloaded. Face API supports the image for emotion recognition.

That first image is saved onto the database with the recorded start time. When the user starts reading text, its time is recorded, and the image is captured. After every five seconds, the image would be captured. The captured image goes into the database. That image is used for the detection of emotion. After calculating the end time of the page when the user turns to the next page, time is also recorded. The total time spent on a single page is recorded like the start and end time of reading and reading. The total time spent on that page would be calculated as:

Total time spend on a single page = number of paragraphs on page/time spent

After applying this equation, we evaluate that the calculated time would be 4 minutes which means that the total time spent on a single paragraph is 4 minutes. We conclude that we can easily find how much time a user spends while reading single sections on a page. All the images that are detected every five seconds are captured for emotion detection. Now we have a dataset that consists of face images and spends time reading online text, so annotation is applied to the face images by focusing on various face regions to detect emotions. Focusing on different face coordination points like the mouth, nose, eyes, lips, and other features, the feelings are detected. The aim of facial coordination points is the detection of the reader’s emotions. After comparing reference images, emotion is detected through the facial recognition model with a descriptive image from the facial appearance. The height and width of the image are detected and added with the detection of images. After that, unique features are obtained through which the exact emotion is recognized.

4.3 API to integration

We can integrate this system with any other online platform for emotion detection. But for now and for testing purposes, we integrated this platform with an online learning platform. Learning platform provides opportunities for creating courses of various types taking into consideration different corresponding standards and interactions among the stimulating and students’ creativity regarding tutor and students. When the student reads any particular text from the platform, the working of the system is started.

In the proposed architecture, the detection of emotion needs a preprocessing algorithm in JavaScript code using the web page. The general steps for the interface are

  • During the run time, the detection and accessing of the camera hardware if present.

  • Creating a preview of the human who is reading the online text through the surface which displays the live image

  • The preview of the layout is built, which considers the preview as user interface control according to user choice.

  • A setup is created to capture the listeners so that user action response is recorded during reading.

  • Capturing and saving the detected emotion when a preview of the image through the layout is built, so it becomes ready to capture the next expression or emotion.

5 Evaluation of the emotion detection application

To evaluate the emotion detection application, students were recruited to test the application. The primary purpose of the evaluation is to access how users feel and to derive data from the real-life situations of online learners by showing their emotions while reading standard text [46]. We evaluated whether students found this platform interesting because they could see their expressions through live graphs while reading. In addition, we intend to conduct a further evaluation to obtain a deeper analysis of students’ behavior while studying online and how teachers found the application useful in terms of emotion detection. It is hoped that the application will provide real-time analysis for teachers to support their students while studying online.

5.1 Learning activity and experiment design

In this research, different phases describe the whole activity. In the first phase, we provided an overview of the emotion detection system, and then students logged into the system and read a short article. While the student was reading, the system recorded the student’s facial expressions and stored them in the database. In the second phase, we plotted some charts, graphs, and a table on the stored databases.

5.2 Data collection

We collected data from students through the emotion detection application while students were testing the application online. The application recorded their emotions and stored them in the database. These data were analyzed in the data analysis section.

5.3 Data analysis

In this section, we discuss the findings of the analysis of data collected from students who read an online text. To detect emotions, we utilized the JavaScript Face-API and publicly available datasets, including CK+ Facial Expression, JAFFE, and KDEF. These datasets contain more than 2,000 images featuring various facial expressions.

For this process, the user allows the camera while reading then the application starts streaming video of the user’s face. After that system detects the face and indicates the facial landmarks of the user’s face. Facial landmarks are the points used to indicate the position of muscles. Those students, who logged in to the system, read a short article, and the system recorded their facial expressions (see Table 3.

Table 3 Student’s expressions while reading a short article

5.4 Results

The system has been tested with different students/learners while reading a short article through the system. We got different emotions from every student. When they read any paragraph, their emotions change. In this study, we have detected the emotions of students after each paragraph by using JavaScript face-API. After 5 seconds, an emotion detection algorithm starts detecting students’ emotions, i.e., what they are feeling after reading any particular paragraph.

Face recognition application only starts recording their emotions after the user has allowed the application to access the camera. Through video streaming, the system starts detecting students’ emotions. These emotions are securely stored in the database. Figure 4 presents the descriptive analysis of students’ emotions obtained during the reading of the short text. This short text was used to conduct a preliminary evolution of the emotion detection application.

Fig. 4
figure 4

Illustration of Emotions through charts

Table 4 describes the scenario of students who read paragraph 1 of page 2, We intend to know the situation of students studying online by presenting the analysis of their emotions in a single instance. For example, students’ emotions while reading paragraph 1 page 2 show varied expressions. While Student 1 recorded 7 neutral sentiments reading a particular text “I once asked a clinical colleague to describe what it had been like working as a trainee with a world-famous surgeon”, Student 3 and Student 5 recorded 3, 9, and 2 neutral emotions, respectively. This result suggests that the same test had a different impact on students.

Table 4 Student’s emotions while reading a particular paragraph of a short article

5.5 Discussion

The proposed system’s emotion detection has proven to be highly efficient, as evidenced by the results we obtained. Previous studies have utilized various techniques, such as vector descriptors for facial motion, active contours for the mouth, eye shape retrieval, and 2D deformable mesh models [3]. In contrast, the proposed system uses a model based on CNN for feature extraction and recognition, which is a cutting-edge approach to emotion detection [3]. Additionally, this face expression recognition model was trained on a diverse set of images from publicly available datasets [16]. However, certain physical factors such as hardware limitations, camera quality, low light conditions, and wearing glasses may reduce the accuracy of results.

The proposed system is quite helpful for teachers to understand the learner’s emotions. This application detects learners’ emotions every five seconds, meaning teachers can instantly understand the learner’s mood or emotion and adjust his/her teaching method according to the learner’s behavior. Concerning the students who participated in the experiments, most of the students gave neutral expressions while reading. We can see the bar of neutral expression is far higher than the other ones (), but on the other hand, we also got other expressions i.e., happy, sad, and surprised. Most of the students only change expressions when they find something different while reading. Also depending on the content, if the content is funny then most probably we can get more happy expressions.

Table 4 lists expressions of five students while reading a particular paragraph 1 of page 2. We can see the count of every student is different from each other. Student 1, Student 3, Student 4, and Student 5 read the same text but the count of their expression varies. On the other hand, Student 2 read a different passage “His response was surprising and troubling - “Oh... he’s brilliant and talented, of course, but to be honest - he’s dangerous” and gave a different expression. Moreover, if we look at the time duration in Table 4 of each student we can see that those students who spent more time on paragraph 1 gave more emotions. For instance, Student 1 and Student 4 spent almost 1:30 minutes on paragraph 1, and the application recorded 7 and 9 neutral expressions respectively. Student 3 and Student 5 spent a few seconds and the application recorded 3 and 2 neutral expressions respectively.

5.6 Comparison with the existing state-of-the-art studies

An ideal system for real-time facial emotion recognition would typically consist of the following components:

Face detection:

A face detection algorithm that can accurately detect the presence and position of faces in real-time video or image streams.

Feature extraction:

A feature extraction algorithm that can extract relevant features from the face, such as the position and shape of facial landmarks, facial textures, and the movement of facial muscles, in real-time.

Preprocessing:

A preprocessing stage that normalizes the images, removes noise and enhances contrast to improve the accuracy of feature extraction in real-time.

Classification model:

A machine learning model, such as a deep neural network or a support vector machine, that can learn from the extracted features to classify different facial expressions in real-time.

Training data:

A large and diverse dataset of annotated facial expressions can be used to train the classification model.

Real-time data processing:

A data processing framework that can handle the real-time streaming of video or images and process the extracted features and classification results in real-time.

Output and feedback:

A real-time output and feedback system that can display the classification results and provide feedback to the user in real-time, such as using audio, visual, or haptic signals.

Table 5 Performance evaluation of the proposed system against previously proposed systems for automatic facial emotion recognition

Overall, an ideal real-time facial emotion recognition system should be accurate, fast, and able to detect and classify a wide range of facial expressions under different lighting conditions, poses, and occlusions in real time. We have devised some parameters in Table 6 in order to analyze the quality of a good facial emotion recognition model. The comparison of the proposed model with state-of-the-art research is shown in Table 5.

Table 6 Characteristics of a novel facial emotion recognition tool

5.7 Limitation of the study

The proposed model for real-time emotion detection during online learning shows promising results with high accuracy. However, it does have certain limitations that need consideration. One limitation is the challenge of accurately capturing and interpreting complex emotional nuances. Emotions are intricate and can vary greatly based on individual experiences and cultural backgrounds, which may impact the precision of the model’s emotional recognition.

Furthermore, ethical concerns arise when dealing with students’ emotional data, as it involves privacy issues and the responsible use of personal information. Safeguarding the collected emotional data and ensuring its proper use is of utmost importance to protect the students’ well-being and rights.

Another aspect that requires attention is the generalization of the model across diverse populations. The accuracy achieved with specific datasets, such as CK+ and JAFFE, may not translate uniformly to other demographics and contexts, leading to potential biases and reduced reliability when applied in real-world scenarios.

To maximize the effectiveness of the real-time emotion detection system, ongoing research, and development are needed to refine and expand the model’s emotional recognition capabilities. Additionally, comprehensive validation with a broader and more diverse set of datasets will be crucial to ensure its adaptability and accuracy across various learning scenarios.

Considering these limitations and addressing them in the implementation and future iterations of the model will contribute to a more robust and reliable system, enabling valuable insights into students’ emotional responses during online learning and supporting their overall educational experience.

6 Conclusion

Emotion detection while reading online text is a thriving research area and with the rise of the online learning system, the need to detect human expression/emotions has increased. Existing methods used vector descriptions for face motion along with contours for moth and eye shape which do not provide good accuracy. This study presents an emotion detection approach to obtain high-quality, robust, and accurate results. The emotion detection is carried out every five seconds. Experiments were performed on the CK+ database and JAFFE face database and the obtained results show the accuracy of 96.46% and 98.43%, respectively. It is observed that the number of expressions depends on the time duration that a student spends while reading a particular text. Different type of expressions also depends on the content and student’s behavior.

In the future, we intend to include more participants in this research. Also, we can extend this research with teachers on how teachers can get useful results in the form of feedback from students. Dealing with other challenges including camera position, lighting conditions, and noise is also left for the future.