1 Introduction

Attention can be described as focussing cognitive resources on information, while avoiding distraction [1]. To know whether a person is attentive or not, the facial expressions and/or eye movement may be observed manually, or by automated methods. Zaletelj proposed a novel approach in assessing students’ attention in classrooms in which Kinect One sensor was used to capture body movements and facial features of students [2]. Zhang et al. [3] collected information on students’ attention using wearable devices. These techniques have the limitation that they fail to respond if there is no obvious shift in attention, resulting in a change in eye/head movement or facial expressions. This made researchers to apply Brain–Computer Interface (BCI) in attention analysis. BCI is defined as a system which gives their users communication and control channels that do not depend on the brain’s normal output channels of peripheral nerves. BCI-based communication has a unique feature that no traditional means of communication is required. BCI can be used to acquire the emotional state as well as the cognitive state of a person. The signals from the brain can be acquired with the help of specialised electrodes. The electrodes can be invasive, partially invasive or non-invasive. The first two types need surgical procedure by which electrodes can be placed inside the brain. Non-invasive electrodes are placed on top of the head, and are popular because they are easy to use and will not cause any long-term impacts. The raw EEG signals from the brain are applied to a feature extraction technique. This work used Fast Fourier Transform to extract the features. The extracted features are applied to a mutual information-based feature selection algorithm.

Apart from many possibilities of BCI like medical research and biofeedback, it can also be used to improve cognitive performance. EEG is a voltage signal which measures the neural activity of the brain and this activity fluctuates according to cognitive activity and mental state [4]. Sezer et al. [5] used Support Vector Machine (SVM) to classify EEG signals to track attention and obtained an accuracy of 70.62%. Hassan et al. [6] devised a novel attention recognition model with advanced machine learning algorithms obtaining data using a single channel EEG device with accuracy of 89%. Certain disorders such as attention deficit hyperactivity disorders (ADHD) can be detected effectively with the help of BCI Technology [7]. ADHD is a neurobehavioral disorder which can cause lack of attention and focus, along with other issues in controlling normal behaviour [8]. Cho et al. [9] proved in their work that EEG biofeedback can be used to enhance attention in children suffering from ADHD. Another set of stakeholders of EEG-based attention detection is visually impaired students, with the difference that auditory stimulus should be presented to them, instead of visual stimulus [10]. Application of deep learning models in classifying EEG data is emerging in fast pace in recent years [11, 12]. Deep neural networks are widely used with high success rates in various areas like bioinformatics, medical imaging and health monitoring [13], especially in the recognition of mental states like Alzheimer’s disease [14] and depression [15, 16]. Toa et al. used deep learning to analyse brain signals, combined with eye-gaze for attention and obtained an accuracy of 92% [17].

In real-life, however, a change in attention is not necessarily associated with a change in eye gaze or head movement. Camera-based systems rely upon changes in facial expression for recognising a lack of attention. In many cases, the person’s attention may be lost even when there is no change in facial expression. On the other hand, another person may maintain a neutral expression even when highly attentive. BCI measures the signals directly from the brain, not considering eye gaze or facial expression for detecting the attention state. This gives an objective measure of attention. BCI-based systems can also account for individual variability in the manifestation of attention. In addition to that, when attention analysis is required in domains other than education, for instance, in medical research or rehabilitation, using face detection may turn up as a privacy breach. Hence there is a lack of research in exploring the possibility of considering brain waves alone for the development of an attention recognition model, without compromising the accuracy. Keeping that in mind, this research is devoted to the development of a deep learning neural network model for analysing the brain features in an advanced, effective and accurate way.

BCI will collect only the brain signals, and will not expose the face, surroundings or audio information of its users, unlike camera-based systems. It typically requires the consent and engagement of the user for its working. So, the users will have control over when and how the data are collected. This is a significant advantage over camera-based systems that may collect images or even videos, without the consent from the users. BCI-based systems are less susceptible to surveillance, because it primarily collects brain data and there are no cameras involved. Another advantage is that it is easy to anonymize brain data as it does not include facial or audio data of a person. This ensures the privacy of the users, especially in cases of medical and research settings. Another advantage of BCI is data minimization. BCI collects only specific brain signals; and therefore the overall data amount of data is less when compared to continuous video recordings.

The rest of the paper is organised as follows: Sect. 2 provides state-of-the-art studies in our research area. Section 3 provides the methods we have used to carry out the research. It provides the pre-processing techniques, feature extraction, dataset creation and the machine learning algorithms used in the study. Section 4 consists of the results.

2 Related work

In this section, a summary of previous studies in our research field is provided. This chapter is structured as follows: Initially, we present the various works in attention analysis. Then we discuss the use of machine learning in attention detection. The next sub-section presents some works which used BCI for the analysis of attention. In the final sub-section, we present the use of SSVEP BCI as a communication paradigm between humans and an external device, doing away with the conventional interactive methods such as keyboard or voice commands.

2.1 Attention analysis methods

In Zaletelj’s model, head-motion, pen-motion and visual focus were integrated, thereby producing a multimodal system to analyse the attention of students [2]. Here, information is collected with the help of cameras, gyroscopes and accelerometers and students’ behaviour is collected. Machine vision-based approach was used to obtain good estimations of manual ratings. Automated analysis was used to improve correlations between manual ratings and post-test variables. Farhan et al. [18] presented the use of Internet of Things (IoT) framework in attention analysis. They proposed an Attention Scoring Model (ASM) whose algorithm can be implemented using any programming language. A camera is used to monitor the students’ activities while watching a video lecture. If a face is detected, a face recognition score is logged in and when eyes are seen opened, an eye detection score is logged in. Goldberg et al. [1] propose a proof of concept in which a machine vision-based approach is used for analysing students’ engagement or disengagement in class. The authors extracted direction of gaze, head pose and facial expressions and performed an automated analysis. A pilot study on camera-based attention systems was done by Renawi et al. [19] which used a webcam, a standard computer and computer vision algorithms to estimate the level of attention of students in a classroom. All these works used changes in facial expressions captured using camera for detecting inattentive states. In most of the real-life scenarios, a loss of attention, even when it is deliberate or not, may not reflect in a change in facial expression. So, it is evident that there is a requirement of techniques other than camera-based ones for analysing attention.

2.2 Use of machine learning in attention detection

Machine learning has been used for the classification of attention in many works. Numerous works which assessed students’ attention with the help of machine learning techniques have been reviewed by Villa et al. [20]. Li et al. [21] proposed a machine learning-based approach, which is a novel multimodal assistant system to infer the attention of students during formative assessment. K-Nearest Neighbours was used by them for real-time recognition of attention by developing a Self-Assessment Manikin (SAM) model. Instead of using neural networks, simpler methods such as support vector machines (SVM) can lead to less training time and speedy convergence. They obtained an average accuracy of 57.03% [22]. So, when simpler methods are used for attention analysis, it results in lowering accuracy.

2.3 Use of BCI in human attention detection

Recently, brain waves are used for the recognition of emotional states of an individual. Electroencephalography (EEG) sensors are commonly used to capture the brain waves and advanced machine learning techniques are used to recognise attention level. One of the earliest attempts in this was made in a work by Hassan et al. [6]. The work used frequency decomposed EEG and devised a novel machine learning attention recognition model. A hybrid model combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) was used for sequence classification. A study of students’ attention levels at real classroom settings was done by Sezer et al. [5] with the help of NeuroSky MindWave which is a commercial EEG device used to measure brain waves. The results of this study indicated that teaching methods using digital media helped in an increased attention compared to lectures without video or PowerPoint presentations. SVM was used to classify EEG data collected from students in classroom with the help of mobile sensors [23]. The classification accuracy obtained was 70.62%. The use of artificial intelligence in multilevel attention recognition is explored by Parui et al. [24].

2.4 Use of SSVEP BCI as a communication paradigm

BCIs can be classified as endogenous and exogenous. Endogenous BCI allows intentional modulation of neuronal activity by the users whereas exogenous BCIs depend on stimulus applied externally to the user. Former includes paradigms like Motor Imagery. SSVEP and P300 paradigms fall under the latter. A study of both kinds is presented by Ravi et al. [25]. Our work is focussed on SSVEP-based BCI. Visual Evoked Potentials (VEP) are used widely in EEG-based BCIs. The design of a suitable visual stimulator has an important role in using VEPs for BCI. Wang et al. proposed the design of a visual stimulator for use in SSVEP BCI [26]. Their approach was to use computer monitor flickers to elicit steady state visual evoked potential at a flexible frequency. BCIs are used by people of all ages. The dependence of age in various subjects in the performance of BCI is reviewed by Volosyak et al. [27] and the work concluded that there will be a performance drop in SSVEP BCI in people with advanced age and suggested that GUIs should be modified for elderly users. SSVEP-based BCI can be used in neuro orthosis for cases like tetraplegia [28] and for controlling robotic arms [29] and robotic wheelchairs [30, 31]. It can also be used in combination with motor imagery [32], electromyography [33], eye gaze [34] or event related synchronisation [35] and event related desynchronisation [36]. When BCI is combined with another system or when more than one BCIs are involved, the system is called hybrid BCI [37]. Numerous attempts have been made in improving the performance of SSVEP BCI, such as Task Discriminant Component [38] and other optimization techniques [39]. From all the related work, it is obvious that the development of an accurate attention recognition model is a requisite, in diverse domains in which human cognition has a role.

3 Materials and methods

A BCI device establishes a communication between human brain and an external device. The architecture of the proposed BCI system is presented in Fig. 1. The objective of this system is to acquire EEG signals from the brain of the user, pre-process them, select the relevant features from them according to the Human Attention Recognition algorithm and classify those signals using a deep learning classifier to recognise the attention levels. An EEG-based Brain Communication Interface is used to estimate the mental attention states of humans. A Human Attention Recognition Algorithm is developed and a 10- layered deep learning neural network is used for classification. The challenge is aimed at using machine learning to estimate the attention of students. The dataset is the publicly available Confused Student EEG brainwave data from Kaggle which consists of EEG data collected from 10 volunteers who were made to watch MOOC video clips [40]. Each student watched 10 videos, which means there are 100 data points. It also contains demographic data and video data, but for this work, we have used only EEG data as the benchmark. The creators of the dataset claim that the dataset is well suited for binary classification. It consists of Video ID, subject ID, EEG frequency bands and also User-defined and pre-defined labels.

Fig. 1
figure 1

Block diagram of the human attention recognition system

The Electroencephalography (EEG) signal measures the electrical movement of the cerebrum [41]. EEG uses either dry electrodes or wet ones, depending on the application. The wet electrodes come with an abrasive paste and an electrolyte gel, used to reduce the skin impedance, to a range of 5–20 kΩ, which is an acceptable value when compared to MΩ range without the use of the gel. The paste and gel are minimally invasive and harmless, but are sticky and make the scalp dirty. Apart from that, when the gel begins to dry up, transductive properties start disappearing which makes the wet electrodes not suitable for long term measurements. Therefore, in recent decades, the use of dry electrodes has been increased, which resolves the limitations of the wet ones [42].

This work used an 8-channel headset in which dry electrodes are installed in Ultracortex nodes. Recording of EEG signals is done with Ultracortex Mark IV which is a 3D-printable headset. Among the electrodes, six of them are named spikey electrodes, designed to be used in areas of the head with hair and two of them, named non-spikey electrodes, are designed for the forehead. These electrodes are placed in a 3D printed headset. 5 more comfort units, without electrodes were also used to distribute the weight of the headset. An OpenBCI Cyton 8-channel board was used for EEG signal acquisition. Ultracortex Mark IV electrodes do not require conductive gel or adhesive paste as they are dry electrodes. Also, there is no need of skin preparation. There are two ear clips which act as the reference electrodes. Electrodes were placed according to the 10–20 International standard for EEG electrodes [43].

3.1 EEG recording

The 10–20 EEG system was introduced three decades back by Homan and since then had been the standard for locating EEG electrodes on the scalp [41]. It measures external cranial landmarks for locating electrodes. The measurement is based on the assumption that the scalp electrode locations and underlying cerebral structures maintain a consistent correlation. For 8-channel system, the standard mentions electrode positions as described in the Table 1. The electrodes mounted on the helmet is shown in Fig. 2. This can be worn comfortably by a user for recording his/her EEG data. The acquired signals were visualised and recorded using OpenBCI Graphical User Interface.

Table 1 Electrode names and their positions on the brain
Fig. 2
figure 2

Experimental set up

The voltage levels from each of the electrodes along with information like accelerometer data and timestamp are automatically saved as a text file by the OpenBCI application.

3.2 Experimental setup

The Kaggle dataset provided pre-processed data which could be applied directly to the deep learning model. To generalise the model, we created our own dataset in which eye-blinks and power supply noise are eliminated using filters. The acquired raw data were converted into frequency sub-bands, Alpha1, Alpha2, Beta1, Beta2, Gamma1, Gamma2, Delta and Theta. Before the experiment, the participants were given a brief introduction on the nature of the experiment. They were seated in a cabin, with a table, chair, a laptop and a mobile phone. Then they were allowed to wear the EEG equipment, with the help of one of the authors and were asked to relax so that they can get used to the equipment. Each of the participant were to watch a 2 min video in a day. There were 10 such videos and it took 10 days per participant to complete the experiment. While the participants watched the video, the EEG signal from their brain were acquired and recorded. The collected raw EEG was input to the deep learning network for binary classification. The detailed process of the dee learning model will be explained in a subsequent sub-section.

Given the EEG data from 10 participants, our task was to determine their attention using deep learning methods. The participants were assigned to watch 10 videos containing pre-defined labels classifying them as ‘easy to focus’ (label 1) and ‘difficult to focus’ (label 2). The easy videos contained topics which are familiar to an engineering student, whereas the difficult topics were taken from the middle of video clips containing topics which are not familiar to them, removing the introduction part. The participants wore an 8-channel EEG headset which is connected to the OpenBCI software which was used to extract the focus data.10 healthy male volunteers, each with normal or corrected to normal vision participated in the experiment. The videos had an average length of 2 min. The volunteers wore the Ultracortex Mark IV headsets while watching the videos and the EEG waves were recorded with the help of OpenBCI software. They were all given awareness of the procedure and purpose of the study. Before the commencement of the experiment, consent was obtained from each of the participants and once the signals were acquired, their names were anonymised, so that their privacy is not affected. The data are not made public and is available with the authors only, ensuring its ethical use. Data are collected from 10 volunteers, watching 10 videos, leading to 100 data points in more than 12,000 rows. The sampling frequency was 256 Hz. Each volunteer watched one video per day. The experiments were monitored by one of the authors so as to ensure that no significant disruptions took place.

We used K-means to cluster the data due to the fact that it guarantees convergence. Although the number of target classes is known to be 2, we verified that using the elbow method, which finds the optimal value of the number of clusters, k by a graphical visualisation-based technique [45]. In this method, variation is plotted against the number of clusters and the optimal value is picked as the elbow of the curve. Figure 3 shows the variation Vs number of clusters and the elbow of the curve was obtained at k = 2. Hierarchical Model, Gaussian Mixture Model (GMM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) were also applied, but K-means gave a better performance in demarcating the datapoints.

Fig. 3
figure 3

Flowchart of the proposed algorithm

3.3 Data pre-processing

We used a mutual information-based feature selection algorithm to determine the mental attention states of subjects based on the EEG signals acquired from their brains using a BCI device. A band pass filter is applied so that frequencies of range 1.5–50 Hz is included. A notch filter is applied for 50 Hz frequency to eliminate noise due to supply power. K-means clustering is a classical algorithm which is used to divide data into classes or clusters [44].

Our work combines feature–feature mutual information and feature-class mutual information so that an optimal subset of relevant features is obtained. MI-based feature selection was chosen for our work because of its effectiveness in evaluating multiple classifiers in multiple datasets. Hoque et al. evaluated MI-based feature selection method over 12 datasets having varying dimensionalities and the performance was compared with 8 other feature selection methods like Chi square and symmetric uncertainty [46]. EEG signals were classified using maximised mutual information by Aci et al. [11] for classification of emotions.

figure a

Mutual Information-based Feature Selection

For the dataset D, let F be the feature set

$$F= \left\{{f}_{1,}{f}_{2,}{f}_{3},\dots ,{f}_{{\text{d}}}\right\}$$

An optimal subset of the features F’ is to be selected such that.

F’ ⊂ F.

For F’, a classifier should give maximum classification efficiency.

The proposed method uses Mutual Information theory for finding out a feature subset of maximum relevance and minimum redundancy.

For any pair \(\left({f}_{i},{f}_{j}\right),\in F^{\prime}\)

The formal definition of Mutual Information is given as follows:

Mutual Information

$$\frac{{\text{I}}\left({f}_{i},{f}_{j}\right)=\frac{1}{{F}^{2}}\sum_{f{\text{i}}, f{\text{j}}\in \mathrm{ F}}p\left({f}_{i},{f}_{j}\right){\text{log}}p\left({f}_{i},{f}_{j}\right)}{p\left({f}_{i}\right)p\left({f}_{j}\right) }$$
(1)

where p(fi, fj) is the joint probability density function of fi and fj and p(fi) and p(fj) are the marginal probability density functions for fi and fjLiu.

The minimum redundancy condition

For the feature set F,

$${\text{min}}\left({F}_{i},{F}_{j}\right)= \frac{1}{{F}^{2}}\sum_{f{\text{i}},f{\text{j}}\in F}{\text{I}}\left({f}_{i},{f}_{j}\right)$$
(2)

where I (fi, fj) is the mutual information (MI) between the features fi and fj.

The Maximum relevance condition

$$\frac{1}{F}\sum_{fi,fj\in F}I\left({C,f}_{i}\right)$$
(3)

where C defines the classes and I (C, fi) is the relevant feature fi for a class C.

The final set should satisfy both the conditions simultaneously.

$$I\left({F}_{i,}{F}_{j}\right)=p\left({f}_{i,}{f}_{j}\right){\text{log}}p\left({f}_{i,}{f}_{j}\right)p\left({f}_{i,}\right)p{(f}_{j})$$
(4)

Feature–feature mutual information is calculated for all the 14 features. The selected features should have this value lower than the threshold, which is set to be 1.

Feature-class mutual information should be high for each class C.

MI score is calculated by including both the conditions.

Calculation of mutual information involves computational difficulties and when higher dimensions are involved, it will become more and more complex. A solution to this problem is to use non-parametric entropy estimators to calculate Mutual information. Such an estimator is the Renyi’s entropy. Renyi’s entropy can be used to formulate mutual information between features and target classes. A transformation function is applied to feature–feature mutual information and feature-class mutual information and a discrete value is assigned for each feature.

Entropy is the uncertainty in a variable. A high value for entropy implies almost the same probability of occurrence for every event, whereas a low entropy implies widely different probability of occurrences. The information one random variable possess about another random variable is termed as the mutual information of those variables [60]. This measure is used in feature selection to quantify the relevance of a feature subset with respect to the target class. Discarding irrelevant features and selecting the relevant ones will help to reduce computation time. Mutual information-based feature selection used in this work corresponds to finding an optimal feature subset with minimum redundancy and maximum relevance (mRMR). It finds applications in various fields like machine learning, information theory and image processing, in feature extraction and clustering analysis.

Renyi’s entropy is a generalised formulation for order α. Selecting α = 2 makes the measure positive for all the values of i and j. So, we use Renyi’s quadratic formulation for measuring entropy.

Let μ be the median and σ be the standard deviation of a particular feature

$$ f^{\prime} \left( i \right) = 1 \;if \;i \ge \mu z + \sigma /2 $$
$$0\,\mathrm{ if }i < \mu - \sigma /2$$

The entropy or uncertainty of a class label is given as follows:

$$H(C)= - \Sigma P(c){\text{log}} P(c)$$
(5)

For a feature vector y, the uncertainty of the class identity is known as the conditional entropy [17].

$$H(C,Y) = yp(y) (cp(c/y) {\text{log}} (p(c/y){\text{d}}y$$
(6)

The amount by which the class uncertainty is reduced, over the feature vector is called mutual information, which is given by:

$$I\left(C,Y\right)= H\left(C\right)- H(C/Y)$$
(7)

 = \(\Sigma P(c){\text{log}} P(c) - yp(y) (cp(c/y) {\text{log}} (p(c/y){\text{d}}y p(c,y)\)

$$= p(c/y)p(y) {\text{and}} P(c)= yp(c,y) {\text{d}}y$$
(8)

Lemma 1

The joint density of C and Y can be factored as product of marginal densities P(c) and p(y).

Proof of Lemma1

$$ I\left( {C,Y} \right) = {\text{cyp}}\left( {c,y} \right) {\text{log }}p\left( {c,y} \right)/ P\left( c \right) p\left( y \right) {\text{d}}y $$
(9)

When C and Y are independent of each other,

$$I(C,Y) = 0$$
$$w= {\text{argmax}} w(I\{{c}_{i},{y}_{i}\}): {y}_{i} = {w}^{T}{x}_{i}$$
(10)

Here, \({w}^{T}w=I\)

Columns of D X d matrix W span Rd.

For visualisation of the EEG data, we used OpenBCI GUI. OpenBCI has an in-built feature of saving the electrode readings in a text file with timestamp added for each entry. During data acquisition, if the recording option is turned on, a text file is formed in a folder named ‘Recordings’ in the OpenBCI folder of the Documents folder of a Windows PC, by default. To convert the data from time domain to frequency domain, fast Fourier transform (FFT) was used. The raw data obtained in this manner was converted into different frequency bands with the help of a Butterworth filter. FFT was applied to every signal.

$${\text{i}}.{\text{e}}.,\mathrm{ cyp}(c,y)\mathrm{ log }p(c,y)/ P(c) p(y)\mathrm{ d}y= 0$$

which implies

$${\text{log}} [p(c,y)/P(c) p(y)] = 0$$
$$ p\left( {c,y} \right) = P\left( c \right)p\left( y \right) $$
(11)

This means that joint density of C and Y can be factored as product of marginal densities P(c) and p(y).

To maximise the mutual information between the feature yi and class C, the partial derivative with respect to y needs to be equated to zero [30].

Let Jp be the number of samples in a class Cp

$$ITyi = yi[(i) + (ii) + (iii)]$$

To find a linear transform with reduced dimensions, there should be a subspace Rd where d < D

3.4 Classification

Our work used Python Keras; an open-source deep learning framework to create a Sequential model [47]. Dense layer, which is a regular deeply connected neural network was used for classification of data. In deep learning, input is analysed layer-by-layer. 10 such layers were used. Each layer used ReLU activation and is followed by a dropout layer to handle overfitting and batch normalisation layer to normalise the batch with its mean and standard deviation. Sci-kit Learn’s StandardScaler was used to scale the data. The model was optimised using Adamax optimiser and Keras binary cross entropy loss class was used to compute the cross entropy loss between true labels and predicted labels. The results are tabulated in Table 2.

Table 2 Evaluation metrics

3.5 Feature extraction

The section explains the different methods by which features were extracted. The methods used are statistical methods, Welch periodogram [48] and fast Fourier Transform (FFT). FFT was used to implement the separation of frequencies into different frequency bands. This method of feature extraction was applied on every electrode signal separately. With 10 signals multiplied by eight frequency bands, 80 features were obtained. With the feature extraction methods, a time-independent dataset was created. The dataset contains a file for each participant which consists of values at each electrode with timestamps. For benchmarking classification of data, machine learning algorithms [49, 50] logistic regression (LR), linear discriminant analysis (LDA), K-neighbours classifier (KNN), decision tree classifier (DT), Naïve Bayes (NB), and Support Vector Machine (SVM) are used.

3.6 Visualisation using OpenBCI

We used OpenBCI GUI, a powerful tool from OpenBCI for recording and visualising the EEG data from the OpenBCI Cyton Board. The GUI consists of mini tools called ‘widgets’ that fit into the interface panes. There are numerous widgets available in the tool, but the work used only four of them. The most important widget which displays the EEG data is the ‘Time series’ widget. It displays eight graphs, representing the voltage detected by the eight electrodes of the EEG acquisition device. The wires of the Ultracortex matches with the colour-codes of the GUI, so that it is easy to keep track of the electrode-channel mapping. If an electrode is providing poor signals, the GUI gives us a ‘railed’ warning, so that the electrodes can be checked for proper positioning and contact. Another visualisation feature is the ‘FFT plot’ widget. It displays frequencies on the x-axis and the corresponding amplitudes on the y-axis. This widget is also colour-coded to match the channels of the time series widget. The head plot widget shows the regions of brain with more activity in deep red colour, with intensity lowering with decreasing activity. It also contains the band power widget which displays the relative voltages of the different frequency sub-bands (Tables 3 and 4).

Table 3 Loss and accuracy
Table 4 Comparison of machine learning algorithms

3.7 Novel contributions

The novel contributions of this paper are presented as under:

  1. 1.

    A Spherical Gaussian kernel-based quadratic entropy model for the binary classification of EEG.

  2. 2.

    A mutual information-based deep learning sequential network with 10 dense layers for the classification of EEG

  3. 3.

    Creation of an SSVEP dataset with educational and non-educational videos as visual stimulus

  4. 4.

    Validation of the entropy model by benchmarking the proposed model against traditional machine learning models.

4 Results

In this section, we present the experimental results of the participants’ EEG data. Since 10 volunteers participated in the experiment for 10 videos in 10 days, we obtained 100 EEG data samples. The performance of the deep learning-based classifier is evaluated using tenfold cross validation [51]. Cross validation is used as a standard method in machine learning for evaluating accuracy of classification and regression [52, 53]. In cross validation, a specified number of available data (1 for every 10, in this case) is kept away while training the machine leaning model so that the model will be unaware of that subset of data. After finishing the training, this unseen set of data is applied to the model and accuracy of the model is evaluated.

4.1 Evaluation metrics

The metrics used for classification and prediction are listed and described in the Table 2.

All the experiments conducted for this study were performed by healthy individuals. They were all informed about the objectives and procedures of the experiments conducted. Figure 9 shows the sample EEG data acquired using Ultracortex Mark IV equipment visualised with the help of OpenBCI GUI. The data so obtained were applied to a K-means clustering algorithm and it was revealed that there were only 2 target classes which could be identified. Elbow method was used to obtain the number of classes, as shown in Fig. 3. To obtain the correlation between the features, Canonical Correlation Analysis was done. K-means clustering technique was used to cluster the data into two, attentive and inattentive. A mutual information-based feature selection is used after preparing a heatmap with Pearson correlation technique. The Fig. 4 Elbow method to find the number of clusters.

Fig. 4
figure 4

Elbow method to find the number of clusters

The clusters can be seen as K = 2 classifier, Naive Bayes Classifier, SVM, Decision designed system is compared with K-neighbours Tree classifier and logistic regression. K-fold validation with K = 10 was used to validate the results. The deep learning network is trained with the EEG characteristic of a few subjects. In the final stage, the EEG signals are classified as either ‘attentive’ or ‘non-attentive’. Figure 4 shows the clustered data after applying K-means. The mutual information-based network was initially applied on the public dataset and the accuracy obtained was 99.21%. The dataset being labelled, traditional machine learning algorithms were also applied for classification.

Fig shows the ground truth against the clustering models K-means, Hierarchical, GMM and DBSCAN methods. It is obvious from the figure that the performance is better with K-means, compared to other models (Figs. 5 and 6).

Fig. 5
figure 5

Sample welch periodogram

Fig. 6
figure 6

Heatmap for calculating the correlation between features

At the first epoch, accuracy was obtained as 55.70% which kept on increasing up to 99.41% at epoch 150. K-fold validation was used with number of folds = 10.

After compiling the model, the accuracy and validation accuracy are plotted as shown in Figs. 7 and 8. Figure 7 shows the accuracy for training set and validation set. Figure 8 shows the loss with training data and validation data plotted separately. A few representative values for different epochs are as shown in Table 2. It can be inferred that Accuracy is increasing and loss is decreasing with increasing number of epochs.

Fig. 7
figure 7

Comparison of the different clustering models

Fig. 8
figure 8

Accuracy and validation accuracy plots

5 Conclusion and future work

In the proposed work, we developed a Spherical Gaussian kernel-based quadratic entropy model for the binary classification of EEG. We collected a new dataset related to individuals who listened to a set of videos. The dataset consists of EEG data from 10 individuals watching 10 videos, which gives 100 datapoints for analysis. We demonstrated detection of attention levels with high accuracy which reached 99.81% (best) and 99.21% (average). The entropy model was validated using a public dataset. The proposed model was benchmarked against 6 other machine learning algorithms and our method outperformed them with respect to accuracy. The mutual information-based deep learning EEG model can be of use to detecting attention levels in students as well as detecting attention related disorders such as ADHD [55]. The mutual information based deep learning EEG model used in this work can be generalised for detection of attention levels in different circumstances. A few of such examples in which generalisation of this study can be used are detection of Alzheimer’s disease and ADHD. In earlier works, human actions were detected, analysed and controlled using numerous methods [56] like exemplar-based methods [57], bag of visual words [58] and also BCI-based methods [59, 60].

Although BCI is a promising technology and this work has used it successfully in recognising attention with a good accuracy, it has a few demerits as well. One major demerit of the BCI is the inconsistency of brain output from person to person and from time to time. When the subjects were stressed or tired, more time was taken to get a steady output, before the video stimulus could be applied (Fig. 9). The recorded EEG data needed numerous steps for preprocessing to obtain good accuracy, so a real-time analysis could not be performed. But if the preprocessing steps can be done in a fast and efficient way, real-time attention monitoring can be done. The subjects put on the EEG cap for a few minutes only, including the time for obtaining stable signals and the time for which video stimuli were applied. Wearing the head cap for long time may cause inconvenience to the users (Fig. 10).

Fig. 9
figure 9

Training loss and validation loss

Fig. 10
figure 10

Sample screenshot of OpenBCI GUI. It shows the time series, FFT plot and band power from an EEG reading

BCI is an interdisciplinary domain which involves research in biology, engineering, applied mathematics and computer science. HARS was devoted to detecting the attention of a human being accurately. The attention data can be used in BCI-based control of electronic devices ranging from a simple gaming vehicle to a complex electronic wheelchair. Incorporating BCI with Internet of Things (IoT) can help humans control devices in home or office with their brain signals. Currently, evoked potentials from local electrodes are used in BCI. In future, development of a comfortable thin layer of EEG harnessing equipment can be developed, which can decode more brain waves than those acquired from the electrodes. Further application of BCI in attention and action recognition gives hope to lot of physically compromised, but mentally active people to lead a better life.