Keywords

1 Introduction

Cognitive performance is an important concept to realize the cognition level of individuals and implement different kind of tasks using their acquired knowledge. Several ways are existing to estimate cognitive performance, including questionnaire, physical and physiological based measures. Questionnaire based measures are the personal measures combines self-reported actions and observers. Then, physical measures include facial expression, gestures and postures detection and physiological measures specifies the assessment of internal features of individuals. Along with these approach, the analysis of brain signals, e.g., Electroencephalograph (EEG), functional Near Infrared (fNIR), and functional Magnetic Resonance Imaging (fMRI) can provide useful information about human behavior and physiological abnormality to estimate cognitive performance of individuals. Among of them, EEG signal is easily acquirable and helps to identify relevant features of cognitive performance. So, these signals can lent a hand to process and extract features denoting brain states. Due to the non-stationary EEG signals, the development of sophisticated analysis is challenging. In this process, machine learning (ML) has allowed dynamic analysis and extracted significant features from it. These EEG features can be analyzed and lead to the accurate detection of cognitive performance [7, 8].

According to the previous studies, many ML based classifiers are used to investigate cognitive performance through EEG signals and detected various neurological issues. For instance, linear discriminant analysis (LDA) was identified a particular signal band that offers more distinct features in EEG signal [4]. Quadratic discriminant analysis (QDA) is closely related to LDA that manipulates a separate covariance matrix for each class and shows the excellent performance for classifying real time dataset. Multilayer perception (MLP) extracts the dominant features and decreases the complexity to identify abnormality in EEG signals like epileptic seizure analysis [10], academic emotions of students [3]. Naïve Bayes (NB) is a commonly used in medical and emotional data processing [2, 14] to classify EEG signals for detecting cyber-sickness [9]. Again, support vector machine (SVM) and k-nearest neighbour (KNN) were investigated EEG signals for different neurological problems as well as academic emotion analysis [2]. Therefore, RNN was also widely used for the EEG data analysis such as confused student’s [11] and epilepsy detection [1].

The technical contribution of this work to assess cognitive performance more efficiently than previous approaches. Therefore, we proposed bidirectional multilayer long-short term memory (BML-LSTM) neural network that can detect cognitive performance more accurately. It was implemented in an open source confused student EEG dataset and identified cognition of individuals. This work was conducted by various data transformation, machine and deep learning methods respectively. Several data transformation methods were employed into primary EEG dataset and generated several transformed datasets. Then, BML-LSTM was applied into the primary and transformed datasets and shows around 96% accuracy to identify confused students. In this case, baseline classifiers describes in previous portion were employed into these datasets. The prime motive of using these classifiers is to verify the performance of BML-LSTM and compare their results. But these classifiers are not exceeded the results of BML-LSTM. Hence, this proposed model shows the best performance than previous works who were investigated this confused students EEG dataset.

2 Proposed Method

To identify the cognitive performance from the EEG signals, a novel pipeline has been developed (see Fig. 1) which describes in the following subsections.

Fig. 1.
figure 1

Proposed pipeline for cognitive performance detection from EEG signals.

2.1 Data Transformation

Data transformation facilitates the conversion of instances from one to another format and represents values into more distinctive representation. In this work, to identify the appropriate composition of the pipeline, we employed distinct transformation methods such as discrete wavelet transform (DWT), fast fourier transform (FFT) and principal component analysis (PCA) into primary EEG dataset and generated several transformed datasets. In the previous literature [5], these methods were widely used to transform instances into suitable format and enhanced the diversity of classification results. For instances, DWT reduced noise by filtering particular coefficients and scrutinizing different non-stationary EEG signals [13]. Using FFT, confused students EEG signals can be converted from time to frequency domain and decreases noise [5]. Furthermore, EEG signals uses PCA to lessen dimensions, complexity and computational time and retain more variability [6]. According to this analysis, we implemented these methods into EEG dataset and get more diverse results along with raw dataset.

2.2 Bidirectional Multilayer LSTM (BML-LSTM)

To analyse the transformed EEG signals, we proposed a BML-LSTM to identify cognitive ability of the students. Recurrent Neural Network (RNN) consists of a recursive neural network where output from each layer is fed as input to the next layer. Nevertheless, the result of a processing node on a certain layer depends not only on the layer’s correlation weight but also on a state vector of prior input or output. RNN remembers while learning and uses the same parameters in each calculation and performs on all the hidden layers at same task. Such computation reduces the parameter complexity contrast to other neural networks. Generally the hidden state \(S_t\) at step t of a RNN can be defined as follows:

$$\begin{aligned} S_{t}=A\left( S_{t-1}, x_{t}\right) \end{aligned}$$
(1)

where, \(x_t\) is the input instance, \(S_{t-1}\) is the output from previous layer and A is called activation function. At every hidden layer, each hidden to hidden recurrent connection has a weight matrix \(W_s\) and the input to hidden recurrent connection has a weight matrix \(W_x\). These weights are shared across time. The hidden state can be defined with all the weighted variables as:

$$\begin{aligned} S_{t}=W_{s} s_{t-1}+W_{x} x_{t}+b \end{aligned}$$
(2)

where \(W_{s} \in \mathbb {R}^{d_{s} \times d_{s}}\), \(W_{x} \in \mathbb {R}^{d_{x} \times d_{x}}\), \(b \in \mathbb {R}^{d_{s}}\) and d represents the size of the respective vector space.

The main drawback of RNN is vanishing gradient that explodes this problem. At each time step, this classifier contains some loss parameters and gradients carry this information from time to time. During back propagation, gradients travel from last to first layer. Therefore, LSTM is an improved version of RNN that handles long term dependencies problem. It uses designated hidden states called cell that stores information for long period of time so that particular information is available not only the immediate subsequent steps but also for later nodes. It control removing or adding information to a cell state which is carefully regulated by gates. It has three specialized gates called the forget (\(f\_t\)), input (\(i_t\)) and output gate (\(o_t\)). Therefore, the sigmoid (\(\sigma \)) and tanh are activation function where tanh implies non-linearity to squash the activations between \([-1,1]\).

$$\begin{aligned} f_{t}=\sigma \left( W_{f} \cdot \left[ S_{t-1}, x_{t}\right] \right) \end{aligned}$$
(3)
$$\begin{aligned} i_{t}=\sigma \left( W_{i} \cdot \left[ S_{t-1}, x_{t}\right] \right) \end{aligned}$$
(4)
$$\begin{aligned} o_{t}=\sigma \left( W_{o} \cdot \left[ S_{t-1}, x_{t}\right] \right) \end{aligned}$$
(5)

The recurrent connection in a LSTM has the form:

$$\begin{aligned} c_{t}=c_{t-1} \otimes f_{t} \oplus \tilde{c}_{t} \otimes i_{t} \end{aligned}$$
(6)

and the cell’s final output has the form:

$$\begin{aligned} s_{t}=o_{t} \otimes \tanh \left( c_{t}\right) \end{aligned}$$
(7)

Here, \(\tilde{c}_{t}\) is the output of the two fully connected layer defined as:

$$\begin{aligned} \tilde{c}_{t}=\sigma \left( W_{o} \cdot \left[ S_{t-1}, x_{t}\right] \right) \end{aligned}$$
(8)

BML-LSTM trains two RNN and generates the output based on the previous and future element. If all the time sequence is known, one network is trained the input sequence and the second network is trained the time reversal of the input sequence that significantly increase the accuracy. In the proposed model, three BML-LSTM layers have been implemented where first, second and third layers contained 5, 10 and 5 neural units, respectively. The tanh function is applied as the activation function for the hidden layer. This states are linked to the fully connected layer with sigmoid function and adam is used as the optimizer. Therefore, it produce the output 0 or 1 that indicates a robust and stable model to estimate cognitive performance respectively.

2.3 Baseline Classifiers

To justify the proposed BML-LSTM model performance, we used several baseline classifiers include LDA, QDA, MLP, NB, SVM, KNN and RNN were applied into confused student’s EEG dataset.

2.4 Evaluation Metrics

Confusion matrix is described the performance of a classification model based on the test data where true values are known. It indicates the number of correct and incorrect predictions with count values and broken down each class. Based on positive and negative classes, this matrix is defined True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).

  • Accuracy: It denoted the efficiency of the classifier in terms of probability of predicting true values.

    $$\begin{aligned} \text{ Accuracy } =\frac{TP+TN}{(TP+TN+FP+FN)} \end{aligned}$$
    (9)
  • AUC: It explores how well positive classes are isolated from negative classes.

    $$\begin{aligned} AUC=\frac{\text{ TP } \text{ rate }+ \text{ TN } \text{ rate }}{2} \end{aligned}$$
    (10)
  • F-measure: It measures the harmonic mean of the precision and recall.

    $$\begin{aligned} F-\text{ measure }=\frac{2 \times \text{ precision } \times \text{ recall }}{ \text{(precision } +\text{ recall})} =\frac{2 \mathrm {TP}}{2 \mathrm {TP}+\mathrm {FP}+\mathrm {FN}} \end{aligned}$$
    (11)
  • G-mean: Geometric mean (G-mean) is the product root of class-specific sensitivity, creates a trade-off between the accuracy maximization on each of the classes and balancing accuracy.

    $$\begin{aligned} \text{ GMean }=\sqrt{(\text{ TPrate } \times \text{ TNrate})} \end{aligned}$$
    (12)
  • Sensitivity: The proportion of correctly identified actual positives are measured by using following equation.

    $$\begin{aligned} \text{ Sensitivity } =\frac{TP}{(TP+FN)} \end{aligned}$$
    (13)
  • Specificity: The proportion of correctly identified actual negatives are determined by using following equation.

    $$\begin{aligned} \text{ Specificity } =\frac{TN}{(TN+FP)} \end{aligned}$$
    (14)
  • False Negative Rate: The ratio between correctly identified false negative and actual positive values are indicated as false Negative Rate / miss rate.

    $$\begin{aligned} \text{ False } \text{ Negative } \text{ Rate } =\frac{FN}{(FN+TP)} \end{aligned}$$
    (15)
  • False positive rate: The ratio between correctly identified false positive and actual negative values are indicated as false positive rate / fall out.

    $$\begin{aligned} \text{ False } \text{ Positive } \text{ Rate } =\frac{FP}{(FP+TN)} \end{aligned}$$
    (16)

2.5 Dataset Description

The dataset was obtained from Wang et al. [14], who had collected 10 MOOC watching students’ EEG signals. They prepared 20 online learning videos in two categories 10 of them contained normal conceptual videos and another 10 videos have different unusual or hard topics. In critical videos, 2 min clip was taken shortly from the middle of this videos that made more confusion to the students. They considered 10 sessions for a student where first lesson was given to refresh their mind for 30 s. In next lesson, students wore a wireless MindSet EEG device and tries to learn from these videos as possible where this activities around the frontal lobe have been captured by this device. The data points were sampled at every 0.5 s. Different features such as proprietary measure of mental focus (attention), proprietary measure of calmness (mediation), raw EEG signals, delta band (1–3 Hz), theta (4–7 Hz), alpha1 (lower 8–11 Hz), alpha2 (higher 8–11 Hz), beta1 (lower 12–29 Hz), beta2 (higher 12–29 Hz), gamma1 (lower 30–100 Hz) and gamma2 (higher 30–100 Hz) power spectrum were included respectively. After each session, each student graded his/her level on the scale of 1–7 where 1 indicated less confusing and 7 indicated more confusing. Moreover, three students observed student’s attitude and graded them by following the same scale. Again, four observers witnessed each 1–8 students in that work. Therefore, these levels were quantized into two class that indicates whether the student is confused or not.

3 Results and Discussion

In this work, we used scikit learn machine learning library [12] to transform and classify confused student’s EEG dataset using 10-fold cross validation in Python. Then, the performance of each classifier is evaluated using different metrics respectively.

Table 1. Performance Comparison with Baseline Models

3.1 Overall Performance of the Model

When we implemented BML-LSTM along with baseline classifiers in the raw dataset. In this work, BML-LSTM represents the highest (96%) accuracy and the lowest miss rate (4.50%) and fall out (4.48%) respectively (see Table 1). In addition, it also represents similar results like accuracy for the other evaluation metrics respectively. RNN shows 87% accuracy and more metrics are generated same outcomes in this work. After RNN, LDA shows better results where it shows 59% accuracy, f-measure and sensitivity and 60% AUC, G-means and specificity respectively. However, KNN shows 56% all of its evaluation metrics except error rates. Later, another classifiers like QDA, NB, MLP and SVM also show their results for different evaluation metrics (see Table 1). Like other neural network performance e.g., BML-LSTM and RNN, MLP don’t show more accuracy in this work. Therefore, SVM shows the lowest (51%) accuracy with other evaluation metrics except error rates. Besides, The AUC scores give some more insight about the outcomes to classify the EEG data of confused students in Fig. 2.

Fig. 2.
figure 2

ROC Curves of BML-LSTM and Different Classifiers for Raw Signals

Table 2. Effect of Preprocessing in the Performance of the BML-LSTM model.

3.2 Effect of Preprocessing on Overall Model Performance

Therefore, the classification results of BML-LSTM for primary and transformed datasets are shown from Table 2. This analysis indicates how proprocessing steps such as data transformation methods can effect the results of proposed model. In the DWT transformed dataset, the performance of the classifiers are not more satisfactory comparing to raw data analysis where BML-LSTM shows 70% accuracy and AUC respectively. In FFT transformed dataset, the proposed model represents 59% accuracy and AUC respectively. According to the Table 2, FFT models show the lowest results in this work. Alternatively, BML-LSTM shows better outcomes around 89% for the PCA transformed dataset. It performed well rather than DWT and FFT transformed datasets, but it is not exceeded the performance of BML-LSTM at raw EEG signals.

In this work, proposed BML-LSTM shows the best result than other baseline classifiers for the primary EEG dataset. Therefore, we also represented the effect of proposed model into transformed EEG datasets when the performance of BML-LSTM is represented in Table 2. In previous studies, several works had been happened to analyze bemused student’s instances about watching educational video clips that makes them confusion in different levels. When we compared the outcomes of current study with previous works, most of them didn’t justify their studies with preprocessing perspectives. In current study, we implemented most widely used data transformation methods to observe how these methods were worked in confused student’s EEG dataset and generated significant results. Therefore, the classification results of transformed datasets are not shown better than raw EEG dataset. For only 11 features, feature selection methods are not worked as well. Therefore, our proposed BML-LSTM shows the best classification result comparing to previous studies. The comparison of current work with other studies are represented in Table 3. Though we use a few amount of EEG dataset, proposed model avoid overfitting and also increase the generalization ability using cross validation techniques.

Table 3. Comparative Study with Previous Works

4 Conclusion

Cognitive performance measures as a effective capabilities that can arise individual person at different circumstances. It can hamper for different reasons and needs to identify these risk factors about it. For instance, EEG signals can record the brain’s electric activities during the learning process and identify confusion of students by scrutinizing extracted features in the signal sub-bands. ML methods are generated significant gain to classify EEG signals. Learning through MOOC videos, confusion occurs due to the lack of direct communication with the mentors. With its increasing popularity of MOOC providers, it required to look up individual methods and reduce such drawbacks. In this work, proposed BML-LSTM shows 96% accuracy to classify confused and non-confused students by analyzing their EEG signals. However, it represents the best result comparing to baseline classifiers as well as existing works. To categorize confused students, we used a open source EEG signal dataset which were not so much large for analysis. In future, we will gather more EEG data to explore various confusion related activities and generate numerous psychological outcomes.