Keywords

1 Introduction

Currently, one of the main problems related to learning is the amount of attention that students dedicates to the execution of a proposed task. The level of attention of each person is increasingly affected by the evolution of the use of the Internet and social networks. These two factors have a high impact on attention, as they offer lots of information regarding general interest, negatively influencing students’ attention. Based on this information, the manager could prevent undesirable situations and improve the users’ attention. It is crucial to improve the learning process and to solve problems that may occur in an environment using new technologies [4].

Attention is a complex process through which an individual is able to continuously analyse a set of stimuli and, within a sufficiently short period of time, choose one to focus on. Most people can only focus on a very small group of stimuli at a time, preventing the user from focusing on a set of stimuli concerning noticeable information.

Adaptive learning can be a powerful learning tool. Learning skills are vital to school success and employment and are becoming increasingly important for social communication.

Over the last few decades, researchers and developers have worked to create and improve educational technologies. These technologies were transported from the simple aid of teaching to promote high-level reasoning in the academic field. By monitoring student’s behaviour, it is possible to improve the effectiveness of mentoring systems by giving students relevant materials in order to enable their evolution and by providing appropriate feedback and scaffolding [8]. The main goal of Intelligent Tutoring Systems (ITS) is to make these technologies adaptable to students, taking into account their individual characteristics and needs.

In this paper we present an non-invasive approach for an mentoring system based on the behaviour biometric analysis of the work during high-end tasks. More specifically, the system monitors and analyses the mouse dynamics, keystroke dynamics and task activity in order to determine the student’s performance.

2 Related Work

The rapid evolution of the last decades of information and communication technologies has benefited all areas of knowledge. According to [10], ICT has been applied in Education very late. From these technologies emerged the Virtual Environments of Learning (e-learning), in which the students interact as if they were in a real environment. These environments are combined with other applications that provide intelligent tutorials and are called Virtual Environments with Smart or Intelligent Tutoring.

These Intelligent Tutorials aim to adapt to the student’s profile, applying techniques which best adapt to each student, in order to obtain better learning results. Currently there are a number of such tutors, however they do not fully achieve the desired goals since they do not consider an important element that affects student learning: their emotional state. Some of these tutors only assess the emotional state of the student at the end of the work sessions, which alone is not enough to improve learning.

2.1 ITS

In the educational field it is important to have an adaptive and intelligent system to improve learning. Adaptive systems consist of a system that is different for different students or groups of students, taking into account the accumulated information of an individual or groups of students over time. Intelligent System consists of a system that applies Intelligent Environments (AmI) techniques in order to provide better and greater support to users of educational systems [2]. However, what is observed in most of these systems is that they are either adaptive or intelligent. Examples of adaptive system are AHA [1] and WebCOBALT [15], which uses very simple techniques that scarcely can be classified as “intelligent”. Examples of intelligent systems are German Tutor [7] or SQL-Tutor [11], yet, these systems provide the same solutions for different types of students [2].

The ITS must consider the curriculum, the effective state of the students, their learning style and adapt the tasks and the type of presentations in order to obtain better results.

2.2 Attention

Attention is the capacity to focus clearly on one of the various subjects or objects. Attention implies mental concentration on an object through observation or listening, which means the ability or power to concentrate mentally.

The concept of attention can be defined as the transformation of a large set of scattered and unstructured data into a small set of acquired data where key information is preserved. In Computer Science, attention means that there is an input that filters and chooses the most important data in data processing and this is a key mechanism of behavioural control for tasks. This type of process is related to planning, decision-making, and prevention of new situations, however there are limited computation capabilities [9, 14]. Attention implies mental concentration on a given object through observation or careful and meticulous listening, which is the ability or power to mentally concentrate.

3 Framework

The architecture of the developed system (shown in Fig. 1) is divided into three main parts: the lowest level with the devices that generate the data; the intermediate level where the ITS cloud is located; and the highest level, the client system.

At the lower level, the devices that generate the raw data (e.g. soft sensors) describe the students’ interaction with both the mouse and the keyboard. The raw data generated is stored locally until it synchronises with the ITS web server in the cloud at regular intervals. In this layer, each event is encoded with the corresponding required information (i.e. timestamp, coordinates, type of click, key pressed, etc.).

The intermediate level is subdivided into five layers: the storage layer, the analytic layer, the classification profile layer, the emotion classification layer, and the adaptive model interaction. In the storage layer, a MongoDB database stores the data received from users when it is synchronised. MongoDB, in addition to being a data storage engine, also provides native data processing tools such as Map Reduce and the Aggregate Pipeline. Both procedures can operate in a shared collection (partitioned on several machines with horizontal scaling).

Fig. 1.
figure 1

System’s framework.

In the analytical layer, some processes were developed that aim to prepare the received data, such as removing outlines (for example, the backspace key being continuously pressed to delete a character set is not a regular key press), so that this data is evaluated according to the presented metrics. In addition, the system receives this information in real time and calculates, at regular intervals, the values of behavioural biometrics and the estimation of the general level of attention of each student. These are powerful tools for performing analytic and statistical analysis in real-time, which is useful for ad-hoc querying, pre-aggregated reports, and more. MongoDB provides a large set of aggregation operations that process data records and return corrected results, allowing the use of these operations in the data layer to simplify application code and limit resource requirements.

In the classification profile layer, all user indicators are interpreted. Based on the preprocessed data and on the construction of the metadata that will support decision making, the system will classify the user’s profile. When the system presents a large enough set of study cases, it is possible to make classifications accurately. The classifier, in real time, will classify the data received from the different levels of attention, creating the learning profile of each student. With these results, it is possible to obtain a profile of the learning style.

The classification emotion layer has all users’ emotional data and the construction of metadata that will support decision making. The system will sort the user’s emotional profile and, when the system has a high set of data, it makes it possible to make ratings accurately. Note that mouse movements and keyboard usage patterns also help predict the mood of the user.

The adaptive model interaction is based both on the classification profile layer and the classification emotion layer, which adjusts the level of difficulty of tasks for the user in real-time.

Finally, a web application is available at the client layer where students can visualise the tasks they must complete. Moreover, for the managers (teachers), the user’s attention information is displayed at the client layer. The graphical user interface present at this layer consists of a module that allows the creation of graphics (CHART) and allows the creation of virtual teams (ROOM or Classes) so that the administrator can intuitively visualise the students’ behaviour.

4 Behavioural Features

Based on the framework described in Sect. 3, a set of behavioural features are monitored and preprocessed by the proposed system. Through these features, it enables the development of a classification model capable of determining the task at hand executed by the user, given the influence of the user’s biometric behaviours. For this study, Keystroke Dynamics, Mouse Dynamics and Attention Performance Metrics were selected to this end. To monitor students’ performance, behavioural information was collected from a group of students during high-end tasks in a school environment.

4.1 Mouse Dynamics

Mouse dynamics describe an individual’s behaviour with a computer-based pointing device (e.g. mouse or touch-pad). Recently, mouse dynamics have been proposed as a behavioural biometric, under the premise that mouse behaviour is relatively unique among different people. The motions of the user’s hand and by extension the movements of the computer mouse have a direct relation with the psychological – sentimental condition of the user. To be more specific, the way by which the mouse is moved (orbit, speed, intervals of immobility, direction) can demonstrate the user’s condition [4, 6, 12]. Based on this concept, the following features are gathered through the monitoring of the mouse dynamics:

  • Click Duration (cd) - time spent between key up events, whenever this time interval is inferior to 200 ms;

  • Distance Between Click (dbc) - total distance travelled by the mouse between two consecutive clicks;

  • Duration Distance Clicks (ddc) - time between consecutive key up and key down events;

  • Distance Point to Line Between Clicks (dplbc) - computes the distance between two consecutive key up and key down events;

  • Mouse Velocity (mv) - velocity at which the cursor travels;

  • Mouse Acceleration (ma) - acceleration of the mouse at a given time;

4.2 Keystroke Dynamics

Another way of monitoring the user’s performance in human-computer interaction (HCI) is based on keystroke analysis. The way a user types may indicate his/her state of mind. Pressing rapidly the keyboard could mean an altered state, anger or stress for instance, while taking too much time may mean sadness or fatigue. Keystroke dynamics, which measure an individual’s typing rhythms, have been the subject of considerable research over the past few decades and their use for emotion recognition has shown promising results [3, 5, 13]. In this study, keystroke dynamics are used to analyse the attentiveness of a student. The following keystroke dynamics features are monitored:

  • Key Down Time (kdt) - time spent between the key down and key up events;

  • Time Between Keys (tbk) - timespan between two consecutive key up and key down events;

4.3 Attention Performance

Aside from the mentioned behavioural features, which describe the interaction of the student with the computer, the system also registers the application usage by recording the timestamp in which each student switched to a specific application, by recording the user’s ID, timestamp and application name. By default, applications that are not considered work-related is count negatively towards the quantification of attention. The following attention performance features are monitored:

  • Activity Timer - time between the start and the completion of the task;

  • Main App. Total Time Usage - total time spent in the task solver application (i.e. the Adobe Photoshop app.)

  • Main Application Percentage Usage - usage percentage of the task solver application;

5 Methods and Results

In order to validate the proposed system, we have implemented it at the Caldas das Taipas High School, located in the northern of Portugal. For this purpose, a group of volunteer students (9 girls and 13 boys) from the last year of the high school vocational course, whose average age was 17.6 (SD = 1.4) was selected. On different days, students had a class where they accessed to an individual computer and two hours where given to complete a task using Adobe Photoshop application. All involved participants presented computing proficiency and the rooms were equipped with similar computers, each participant was randomly assigned to.

In addition to the biometrics features captured (referred to in Sect. 4), each case study was labelled with the respective activity (i.e. video, image, text and audio). Moreover, based on the biometric features recorded from different soft sensors, the distribution of each feature (e.g. mean, median, standard deviation, etc.) are presented in different scales. In order to solve this problem, it was necessary to apply feature scaling (i.e. normalisation techniques). In this study, the two methods used were max-min normalisation and Z-score normalisation. Min-max normalisation technique is a normalisation strategy which linearly scales a feature value to the range [0,1], based on the minimum and maximum values of the set of observed values. In other words, the minimum value of the feature value is mapped to 0 while the maximum value is mapped to 1. As for Z-score, this technique is a stand-in for the actual measurement, and they represent the distance of a value from the mean measured in standard deviations. This distribution technique is useful when relating different measurement distributions to each acting as a “common denominator”.

With this, several machine learning categorisation methods were used to predict the student’s activity, through the analysis of his/her behaviour in HMI. Several classifiers were trained and tested in order to determine the most efficient method to categorise the student’s activity, where the most applied methods in the scientific literature were taken into account. The set of classification methods trained and tested were: Support Vector Machine, Nearest Neighbour, Naive Bayes, Neural Network and Random Forest.

As for the validation method, a split validation method was used in order to determine the classification performance, where 2/3 of the study cases were used for training the classifiers while the remaining 1/3 was used to test it. Table 1 presents the set of results for the classifiers performance.

Table 1. Comparative analysis of machine learning categorisation performance.

Looking upon the outcome, some conclusion can be taken into account: (1) according to the trained and tested classification methods, the Random Forest method presents overall the best performance, with a correct percentage of classifications of 87.5%, while Support Vector Machine presents the worst performance; (2) through the application of Feature Scaling techniques, the performance of the applied classifiers showed an improvement between [6.25%–25%], where the greatest improvement is verified in the neural network classifier; (3) the performance of the classifiers is dependent on the quality of the features and the total number of case studies analysed (i.e. 48 case studies).

Given the set of conclusions, the Random Forest method was selected to categorise student’s activity. Additionally, this model was optimised through the application of hyper-parameter optimisation. In other words, in order to optimise the Random Forest’s classification performance, it was required to find the optimal number of leafs/features (i.e. between [1–11] leafs/features) and number of trees (in this study it was modelled between [1–500] trees) that best suit the model and minimises the validation error function. For this, an exhaustive grid search was used, where a set of parameter values for each model are trained and validated. In the end, the model with the lowest error rate was the selected one. Figure 2 presents the set of results from this process, where it shows that the model displays a average minimised error when the number of decision trees is 80. As for the number of leafs/features for each decision tree, 9 was the number that presented the best performance. Moreover, the features relevance of the model is presented, where the Activity Timer (alltime) is by far the most important one to predict the student’s activity, followed by the Duration Distance Clicks (ddc), Time Between Keys (tbk), Key Down Time (kdt), Distance Point to Line Between Click (dplbc) and Distance Between Click (dbc).

Fig. 2.
figure 2

Random forest classifier analysis.

Table 2. Random forest: confusion matrix

The respective confusion matrix of the Random Forest’s model is presented in Table 2, where only the Video activity presents an misclassification of 40% of total cases (i.e. 2/5 cases were misclassified as Audio editing activity).

6 Conclusions and Future Work

This work proposes an non-invasive approach for a mentoring system based on the behaviour biometric analysis of the work during high-end tasks. More specifically, the system monitors and analyses the mouse dynamics, keystroke dynamics and task activity in order to determine the student’s performance. Based on the application of several categorisation machine learning models (shown in Sect. 5), the Random Forest model presented the best categorisation performance, with a success rate of 87.5%, where the missclassified cases were focused on the Video activity.

Based on the activity timer we can conclude: (1) for activity timers lower than 75 min the best categorised style is text; (2) for activity timers higher than 90 min the best categorised style are image (60%) and audio (40%); (3) for activity timers between 75 min and 90 min the best categorised style are video (80%) and audio (20%). Furthermore the mouse dynamics and keystroke dynamics support slightly the machine learning model to determine the student’s activities.

As future work, the research will focus on: (1) increasing the number of available case studies to be analysed, through the collection of a greater number of cases of human-machine interaction analysis, during the resolution of high-end tasks (2) increase the number of quality features that would allow a better monitoring of student’s performance (e.g. features related to body posture, touch intensity with the mouse, etc.) (3) detailed analysis of the features that influence the performance of the student’s (e.g. through the correlation analysis of students’ class performance with their biometric behaviours);(4) definition of different student profiles to improve the adaptive learning mechanisms of the platform (i.e. by profiling students into different behaviour clusters, better measures can be applied for each specific profile, improving intelligent mentoring systems performance).