Keywords

1 Introduction

Most studies on electroencephalography (EEG) in this field have focused on the brain-computer interface (BCI). Further, numerous studies have applied EEG classification techniques to achieve such an interface. A typical example is a method using a dataset from a BCI competition, and the latest dataset is from the BCI Competition IV held in 2008, although methods with a higher classification accuracy have since been reported. For example, Ang et al.  [1] applied a Filter Bank Common Spatial Pattern and conducted a classification using a naÏve Bayesian Parzen window classifier. In addition, Gaur et al.  [2] applied subject-specific multivariate empirical mode decomposition-based filtering and conducted a classification using the minimum distance to Riemannian mean method. As described above, when analyzing an EEG, a two-step process, namely, preprocessing followed by a classification, is generally applied. Hence, various combinations of pre-processing and classification methods have resulted in inconsistencies for each classification target. However, it remains unclear whether such methods have been applied to a BCI.

The most effective technique to achieve a BCI through modern technology is the use of a neural network. Neural networks convert any input into any output and can thus be applied to any classification problem. In addition, a processor specialized for neural network processing was developed [4] and has recently been actively incorporated into small devices, such as smartphones. The environment in which this approach can be implemented is becoming more popular now.

To date, EEG classification has been conducted in two stages: feature extraction through a preprocessing and classification using a classifier. A neural network may convert an input into any output, that is, a model obtaining a classification label should be used by entering an EEG directly. In this study, we demonstrate that a previously applied window analysis of the signals can be primitively reproduced on a neural network and that a high classification accuracy can be obtained.

2 Method

A neural network must have the flexibility to cope with various types of data, including images and time signals, and be able to arbitrarily adjust the degree of freedom of the calculation. Conversely, an overlearning is likely to occur, and depending on the network configuration, numerous hyperparameters must be adjusted. In addition, when learning, many techniques such as adjusting the learning rate and early stopping must be considered. However, such situations can be avoided by applying only a few techniques.

2.1 Batch Normalization

Batch normalization  [5] is a method used to normalize the input to each layer of a neural network such that a mean of zero and a variance of 1 are achieved.

According to [6], the higher the learning rate that is set, the more regularized the regularization effect becomes because the amount of noise is increased, owing to the mini-batch selection. When not normalized, the mean and variance increase exponentially as the layer deepens, but they can be kept constant by applying batch normalization, thereby making the gradient highly dependent on the input. Furthermore, the error does not diverge even when a high learning rate is set.

2.2 Convolution Layer

A convolution layer is used for inputs with one or more dimensions of spatial features. A convolution is applied to the spatial feature directions using a kernel. In a convolution layer, the number of weighting parameters is smaller than the number of input–output features and, thus, is regularized. Because an EEG is a time signal, the necessity of considering the regularization parameter is reduced by using a convolution layer that applies a convolution in the time direction.

2.3 Transposed Convolution

A transposed convolution is a conversion applied opposite a convolution. If a convolution is an encoder, a transposed convolution behaves similar to a decoder. An up-sampling can be conducted in the spatial feature direction, and the feature map can be projected onto a higher dimensional space.

2.4 Network Structure

Two different network structures were applied in this study. One is a 1D CNN model that convolves the EEG only in the time-axis direction. In this case, global average pooling was conducted after the four convolution layers, and classification was applied using all connected layers. The other is a 2D CNN model that applies a convolution layer after transforming it into a form with 2D spatial features using a transposed convolution. There are several benefits to converting from 1D to 2D. For example, when a convolution layer is used, the ratio of the number of weight parameters to the number of inputs and outputs decreases; thus, the regularization effect is enhanced. In addition, because a 2D CNN model has many useful learned models, some of which have achieved high results in the ImageNet Large-Scale Visual Recognition Challenge, the model can be transferred. In general, during the process of transforming 1D spatial features into 2D spatial features through a transformed convolution, a window analysis is applied on the time signal, and the features are extended in the new axial direction. Thus, the calculation corresponding to the preprocessing performed thus far can be primitively reproduced by the neural network.

Although the input EEG has spatial features in two directions, namely, the time direction and the channel direction, the channel direction is arranged in order of the channel number of the EEG at the time of measurement and is not spatially a significant feature. The EEG channel direction is placed the unconstrained channel direction of the convolution layer.

Tables 1 and 2 show the specific configurations of the above two models:

Table 1. 1D CNN model structure
Table 2. 2D CNN model structure

2.5 Dataset

The BCI competition IV 2a dataset  [3] was used for the network evaluation. This dataset includes EEG signals recorded from nine subjects on four types of motor image tasks: right hand, left hand, tongue, and foot. A total of 22 EEG channels is applied, and the sampling frequency 250 Hz. A total of 288 training and test data trials are recorded for each subject, including missing parts and some excluded trials.

In this case, after replacing the missing values with zeros and applying a low-pass filter, the sampling frequency was down-sampled 63 Hz, which was approximately one-fourth of the original rate, and all trials including the excluded trials were applied. As the input, a 4s signal was used, for which a motor image was shown during the trial. The signal was normalized to achieve an average value of zero and a variance of 1 for each trial, as well as for each channel before being input into the neural network.

Fig. 1.
figure 1

Timing scheme of the paradigm

2.6 Training and Evaluation

The cross-entropy is used as the cost function. In addition, Adam  [7] was used as the optimizer, the mini-batch size was 58, and the learning rate was 0.0001. The parameters were updated 1,000 times without changing the learning rate or stopping the learning early.

5-fold cross-validation was conducted for each subject 6 times using the training data, and the classification accuracy of the test data was observed 30 times.

3 Results

Table 3 shows the mean and standard deviation (std) of accuracy when the validation and test data were classified in the model after the parameter update. The classification accuracy of the 2D CNN model was higher than that of the 1D CNN model.

Table 4 shows the accuracy when converted into the kappa value and compared with the value from a previous study. The p-value has been computed using the Wilcoxon signed rank test. The neural network containing the Transposed Convolution achieved an mean kappa of 0.62, which is superior to all previously reported results.

Table 3. Accuracy validation and evaluation
Table 4. Evaluated kappa value comparison with previous studies

4 Discussion

A high-level accuracy was obtained by extending an EEG into a 2D map using a transposed convolution. Using this method, a transposed convolution was applied to extend a signal with a fixed window width in a new axis direction. The best-known method for extending a signal to a 2D map during a conventional analysis is a short-term Fourier transform (STFT). For example, when an STFT is used, the frequency axis is extended by the same number as the window width. However, only frequency information at equal intervals is extracted, and the validity of information is not guaranteed. Therefore, by incorporating the preprocessing into a neural network, the window width and number of expansions can be set arbitrarily, and learning is applied to extract features effective for classification. The idea here is that, by visualizing the parameters and the output of each layer, understanding of the EEG can be improved by analyzing the components of the source necessary for the classification that are included in the signal.

Fig. 2.
figure 2

Loss and accuracy curve of the validation data

In this neural network, the learning rate can be an important parameter because batch normalization is used in each layer. Although the weight parameter does not diverge owing to the effect of such normalization, if the learning rate is too large, the regularization effect becomes stronger and the accuracy does not increase. Figure 2 shows the mean and std of learning curve for 6 times 5-fold cross-validation. When observing the transition of the validation loss, the loss continues to decrease near the end of learning. That is, the accuracy might be improved by adjusting the learning rate for each validation or according to the learning epoch.

5 Conclusion

We attempted to learn the transformation equivalent of a preprocessing with arbitrary window widths and spatial sizes by applying a transposed convolution to EEG signals, followed by using a network of simple structures to which a 2D CNN can be naturally applied. We showed that high accuracy can be achieved by replacing the complex preprocessing of EEGs with a neural network.