1 Introduction

Cognitive training is a set of mental activities that can improve or maintain the cognitive abilities of a person, and it is closely related with the neuropsychological theories of brain plasticity and life span (Willis and Schaie 2009). Many cognitive abilities can be trained, like memory, reasoning or speed of processing, and there are different ways to train each of them. For example, in Ball et al. (2002), they train memory by remembering word lists or paragraphs, finding patterns in letters and numbers series. In that paper, the authors proved the effectiveness of the cognitive training to improve cognitive abilities of older adults that may improve their everyday life.

A novel method of cognitive training is through video games, that can be considered a type of serious games for health (Mondéjar et al. 2019). The main reason for using video games for brain training is that they can generate situations in which many brain areas can be stimulated. Also, using video games it is possible to induce engagement in the user and that makes him be actively interested in the cognitive training (Hervás et al. 2017) and, even, the emotional involvement with the video game (Johnson et al. 2018). There are many commercial video games that are sold as cognitive training games, however, there is controversy about the effectiveness of these kinds of games (MPI4HD and SCL, 2019) (CTD 2019) so it is crucial to validate them scientifically.

One way to validate the cognitive training is trough electroencephalography (EEG), which is a non-invasive technique that detects the changes of the electric potential at the scalp by using electrodes. EEG signals can be analyzed by looking at the graphs of raw or preprocessed data, or through computational methods. However, these methods can get a lot of information that cannot be detected just by looking exclusively at the graphs of the raw data. Apart from the scientific convenience of using EEG analysis to study the cognitive abilities (Klimesch 1999), there is an increasing availability of inexpensive wireless EEG devices that enable new pervasive perspectives (out-of-labs and out-of-clinic uses). For example, there are proposals that combine IoT principles with EEG devices (Chatterjee et al. 2014; Konstantinidis et al. 2015) everywhere, psychological diagnosis with video games (Mondéjar et al. 2016) or EEG-based emotional recognition (Menezes et al. 2017). These emerging fields have led researchers to explore novel paradigms in the fields of Ambient Intelligence and Neurosciences.

The number of techniques of EEG analysis is quite big, so one of the goals of this paper is to focus on some of the most important methods used when working with video games by conducting a systematic review, which was described in (Cabañero-Gómez et al. 2018), the previous contribution that is extended in this paper. This study provides the most promising techniques to quantitatively analyze the cognitive activity during the use of video games through EEG signals. A general-purpose software tool, named eeglib, has been developed which includes those techniques. eeglib has been tested, both technical and functionally, with data from experiments that involved video games. This work is the first step towards creation and validation of cognitive training games.

In the next section, we describe the systematic review that provided us the candidate EEG analysis techniques; we will show the process followed and also the results of this review that includes the selected articles and a list with the techniques. Later, in Sect. 3, we will talk about the developed software that includes the techniques obtained in the previous section. In Sects. 4 and 5, we will show how we tested the software in a technical and a functional approach respectively. Lastly, in Sect. 6, we discuss the results of the paper and talk about the future work derived from the contributions obtained in this paper.

2 Systematic review

Being aware of the increasing popularity of using video games along with EEG, it may be interesting to explore and try to discover different ways to analyze these signals in this specific context. In this case, we want to focus in computational techniques that can be used to work with EEG data. From this premise, we have developed our research question:

RQ1. What are the main tools, methods and techniques that are being used to analyze EEG data when using video games?

Thus, we have conducted a systematic review to answer the formulated RQ1. The protocol to perform the systematic review was the following: (a) To determine what electronic database(s) should be used to explore the state of art related to the objective of this study; (b) To clearly identify the target keywords and define the search string; (c) To define inclusion/exclusion criteria; (d) To screen those documents that previously accomplished the eligibility factors by means of the title and abstract; (e) Based on the content of the papers, to select those documents that provide information about EEG analysis techniques; (f) To determine the metrics, to characterize them and provide the results of this systematic review.

2.1 Systematic review process

The systematic review process related to the protocol in the previous paragraph is illustrated in the diagram in Fig. 1. The first node of that diagram corresponds to the initial search string with some restrictions about the date and language.

Fig. 1
figure 1

Diagram summarizing the systematic review process

TITLE-ABS-KEY (eeg AND (video*gam* OR game) AND (techniqu* OR meth* OR (machine AND learning) OR analy*) AND NOT (ecg OR fmri)) AND PUBYEAR > 1997 AND PUBYEAR < 2018 AND (LIMIT-TO (LANGUAGE, “English”))

The string was searched in Scopus and ACM. After applying the eligibility criteria that includes a minimum number of 4 cites per year, the number of papers is reduced to 57. Later, a manual screening in which the title and abstract are read is applied to the collection of documents to ensure the papers fit the proposed objective, which reduces the number of documents to 36. Finally, another manual screening is performed in order to select the most interesting papers, which produces the final list of selected documents that includes 14 of them. This whole process is better described in (Cabañero-Gómez et al. 2018).

2.2 Systematic review results

From the 14 finally selected documents we have identified up to 31 different algorithms and techniques to analyze or preprocess EEG data. Table 1 shows the selected articles, alongside a summary and the techniques and algorithms identified in each paper. Exploring these documents, we have observed that they show different approaches joining EEG and video games, with Brain Computer Interfaces (BCI) being the most recurrent, although there are many articles talking about cognitive analysis too.

Table 1 Final paper selection from systematic review

In these papers, there are lots of techniques that we have identified and, also, we have studied to discover what they can be used for. Table 2 shows a taxonomy of the main techniques identified by their usage, and includes a brief description of each of them. Notice that some techniques can be classified for more than one use, for example LDA can be used for classification or for dimensionality reduction. The information of this table can be used to select what techniques to use depending of the analysis needs. This is one of the partial contributions of this paper, to provide to the scientific community a simple and aseptic taxonomy of techniques used to analyze EEG signals during the use of video games in the literature.

Table 2 A classification of the identified techniques based on their posible uses

Additionally, the techniques have been ranked by its popularity, being the criteria the number of papers where a technique appears (in the title, the abstract or the keywords). Also, it is interesting to compare whether there were significant differences between the popularity of the techniques when the use is oriented towards video games or, in the other hand, the techniques have a general EEG analysis use. For that, we normalized the number of documents that include a technique with the total number of documents. In Fig. 2 we show the 10 most popular techniques when working with video games (blue bars) and when they are used for general EEG analysis (red bars). There are some interesting conclusions that can be extracted from it: some techniques have a similar popularity like ERP, SVM, ICA and Bandpass Filter, however there are some techniques that are preferred in one of the purposes. For example, FFT, one of the most used technique in general analysis, is used in less than the half of the articles when working with video games. On the other hand, Sample Entropy, Genetic Algorithms and Neural Networks are much more used when working with video games. It is important to notice that the most popular technique is only used 7% of the time, and that means there are many approaches when working with EEG and none is especially common. This ranking can help researchers in cognitive analysis with video games to select the most applied techniques for EEG analysis, but it is important to notice that it is a popularity ranking and it does not imply that those techniques can be applied every time. Their usefulness and effectivity depend on the situation, so it may be necessary a previous research before using them and experimentation to select the best one in each case.

Fig. 2
figure 2

The 10 most popular techniques when using EEG and video games

3 Software developed

From the results of the systematic review, we have selected the techniques to be implemented in a Python library. Python was selected because of its ease to use and its popularity for data analysis. The library is focused on preprocessing and feature extraction techniques, because there are many tools that provide classification and statistical analysis algorithms and visualization techniques which are out of the scope of the library. We have included some other techniques into the library that have not appeared in the systematic review. The main reason to do that is that those techniques are highly related to others that are in the results of systematic review and they are widely used for EEG analysis in other contexts. For example, one of these techniques is Detrended Fluctuation Analysis (Peng et al. 1994), which estimates the degree of self-affinity (a concept related to fractal dimension) of a sequence and it is applicable over non-stationary signals.

The final list of techniques included is the following: Band-pass filter, Independent Analysis Component, Fast Fourier Transform, Hamming Window, averaging, Hjorth parameters, Higuchi Fractal Dimension, Petrosian Fractal Dimension, Detrended Fluctuation Analysis, Lempel–Ziv Complexity, Sample Entropy, Pearson Cross-Correlation Coefficient and Synchronization Likelihood. Some of these techniques have implementations in popular Python libraries, so those techniques have been integrated with our library, while others have been implemented integrally. It can be noticed that the most popular technique in the ranking, ERP, has not been included in the library. That is because ERP cannot be applied over a single window to extract information directly from it., but in fact, it is possible to perform ERP using this library, because it allows manual segmentation of the signal, so those segments can be extracted and later averaged.

The library performs a window-based processing, which consists in selecting slices with a fixed size over the signal vector and applying the algorithms over those slices. The size and offset of the windows can be modified by the user, so they can overlap between them, being adjacent or leave uncovered slices. This approach allows to study the evolution of the features along the duration of the signal. The library also includes tools to load data from typical files used to store EEG data, like CSV or EDF, and a tool to easily generate datasets of features over the windows for the whole signal or a part of it. It has been designed with focus in usability, allowing data scientists to generate processed datasets with few lines of code as it can be seen in Listing 1, which shows an example of data generation using this library. This library is called “eeglib” and it can be found at github.com/Xiul109/eeglib, along some examples explaining how to use it.

figure a

The general structure of the library is shown in Fig. 3, which illustrates the dependencies between the modules. The module’s features and preprocessing contain the implementations of the analysis techniques. These modules are used from the eeg module, which contains the main data structures that will store the signals’ data. The helper module includes classes to help the user to load the EEG signals files. The module wrapper includes some utilities that facilitates the data generation from the EEG signals. Lastly, auxFunctions includes some auxiliary functions that may be needed by other modules of the library. Generally, helper and wrapper are the main modules that data scientist are going to use, given that they are related with data loading from different sources and the generation of new data after extracting features from a signal.

Fig. 3
figure 3

Module dependencies of eeglib

4 Technical test

The technical test aims to check whether the output of the techniques is as it should be or not by applying unit testing. We have only tested techniques we have implemented integrally because the algorithms included in external libraries have their own testing. Also, only techniques with complex implementations have been tested properly, these techniques being DFA, Sample Entropy, LZC and HFD. There are many approaches to do this, like storing a synthetic signal which has known outputs for each algorithm applied. In our case we generate synthetic noise signals of different types (white, Brownian and pink) randomly and use them to test the algorithms. Each of this type of signals has a different frequency-power ratio. If we define the function P(f) which correlates power (P) with frequency(f), we have the next definitions for each type of colored noise:

  • White noise: \(P\left( f \right) = 1\)

  • Pink noise: \(P\left( f \right) = 1/f\)

  • Brownian noise: \(P\left( f \right) = 1/f^{2}\)

In Fig. 4 the three examples of each type of noise can be seen in its time and frequency domains. As it can be seen, the relations described above appear in the frequency domain, but the values are not exact, because randomness is involved in the generation of those signals. The use of this approach has been seen previously for DFA (Herrmann 2019a), Sample Entropy (Herrmann 2019b) and Fractal Dimension (Higuchi 1988). No previous works using this approach have been found for LZC, but we applied it and checked that constant and consistent outputs appeared. Using that knowledge, we have determined which values should be returned by each technique when using the different noise signals inputs and displayed in Table 3. This table can be helpful for developers that may need to implement by themselves these algorithms.

Fig. 4
figure 4

Colored noise signals at time and frequency domains

Table 3 Expected output for each type of noise and each algorithm

We obtain the outputs of applying an algorithm over the synthetic noise signals and we do that many times to get the average result and compare it to the expected result to avoid bias derived from randomness. Sometimes, when an implementation is modified, the output can change slightly but that does not mean that it is incorrect. This situation can occur for example by changing the default parametrization or by reducing the precision of the algorithm intentionally to decrease the time it takes to be executed. With the proposed method we are not aiming to ensure a fixed output comparation but an approximate one and the main advantage of this is that we can modify the algorithm (for example to optimize it or include new parameters) without having to worry about what the new fixed result should be. This method is especially useful for complexity algorithms, because each colored noise signal has its own complexity. All the implementations accomplished this testing, some of them in the first try and some of them had to be modified. In this case, to consider that the test has been passed it’s necessary for the result to be in an error range of ± 0.05.

5 Functional test

The aim of the functional test is to check the effectiveness of the developed software with real data. An experiment involving users playing video games and collecting EEG data during this activity was designed and executed. Data from only one experiment could be insufficient, so we decided to search for open EEG data from another experiment. In both cases, we aimed to check the capability of the features to discern between small windows of signals associated to different cognitive states. The size for the windows is determined experimentally but constraining it around 0.5–2 s because if it is too small, the information of the signal is lost, and if the window is too big, temporal accuracy is lost. Also, both experiments have been executed using twelve subjects what may be a small sample for generalizing the results, but provide initial (and significant) findings that will guide the future experiments.

The materials, intervention and participants differ in the two experiments, as we can see in Fig. 5. However, the data analysis process was similar for both experiments. It consists of six phases: (1) firstly, we divide the data in two partitions and label each of them, (2) then we divide the signal using windows that can be or not overlapped, (3) after that, we extract some features from each window of the signals, (4) later we train a classifier (Random forest) with the features and the labels, (5) at this point we select the most useful features and return to phase 2 if the classification was good and we are using too many features to reduce the complexity of the random forest, (6) lastly, we analyze data extracted from the classifier to obtain valuable knowledge from it.

Fig. 5
figure 5

Experiments summary

There are two main reasons behind using random forest as a classifier. The first one is that it is robust against noise, which is something necessary for the type of data we are working with. The second reason is because it’s possible to extract the importance that has each feature for the classification and that is really useful for interpreting the results.

5.1 Experiment 1: virtual reality games

We started from the premise that there are cognitive differences when playing Virtual Reality (VR) games and traditional games, and we designed an experiment to check if that premise was true. Twelve (12) volunteers participated in the experiments. They were informed about what the experiment consisted of and they signed a consent document. Each volunteer of the experiment played two video games, each one of them has a traditional version and a VR version. Those video games were “Temple Run”, an endless runner that requires timely actions in which the player controls a man who is running from enemies and collecting prizes; and “Play With Me”, a short first person horror game in which the player controls a child that needs to collect four candles while avoiding a clown. Each version of each game was played between 3 and 4 min and the play order was not the same for all the participants to prevent a possible bias. The VR device used is Oculus Go. The data was recorded using the EEG headset Emotiv Epoc + , which has 14 channels and a sample rate of 128 Hz. The data from the players was anonymized by using a numerical identifier.

The two partitions of the data in this case where VR games with label 1 and non-VR games with label 0. The window size used to this analysis was one second (128 samples). After three iterations of the steps 3, 4 and 5, the selected features where DFA and PFD. The metric used to profile the classification was accuracy due to having a balanced number of elements for each label. After performing a cross-validation process we got an accuracy score 98.88%, which is a high result. Additionally, to obtain some extra information about the classification, we extracted the feature’s importance to classify and it is illustrated in Fig. 6. It is remarkable that the two most important features are calculated over the channel 4. This channel corresponds to an electrode placed in the temporal lobe, which has a role in visual perceptions processing.

Fig. 6
figure 6

Importance of each of the 10 features more important in the classification of signals associated to VR and non-VR games

5.2 Experiment 2: affective pacman

For the second experiment, we selected an open EEG dataset recorded while playing a videogame called “Affective Pacman” experiment. This data was obtained originally for Reuderink et al. (2009), where the authors studied the effects of the frustration while detecting events in a BCI system. In the experiment the players had to play “Affective Pacman”, which is a Pacman version designed to produce frustration by making the control unresponsive or freezing the screen in some levels of the game. The EEG signals were recorded at 512 Hz by a device that has 32 channels. The data is properly labelled, including sections (e.g., frustrating, non-frustrating) and specific events (e.g., key pressed, start game).

With that data it is possible to discern between frustrating and non-frustrating periods by using the same analysis process than in the previous experiment. In this case, the window size used had a size of 256 samples (half second). The features selected after following the steps 3, 4 and 5 was the average power spectra for each typical EEG frequency band: delta (1–4 Hz), theta (4–8 Hz), alpha (8–12 Hz) and beta (12–30 Hz). The obtained accuracy was 98.98% after a cross-validation. Extracting the features importance, we got the twelve most important for the classification which values are displayed at Fig. 7. In this case, the importance is shared between the features, with the most important one not reaching the 5%, but we can see that they all belong to the delta frequency band.

Fig. 7
figure 7

Importance of each of the 12 features more important in the classification of signals associated to frustrating and non-frustrating periods for the “Affective Pacman” dataset

6 Conclusions and future work

The departure point of this article was cognitive training through video games and the importance of EEG to assess the effectiveness of the training. From that premise, we have focused this article as a first step in achieving a bigger objective: developing a method to design video games for cognitive training and a method to assess the effectiveness of a game as a tool for cognitive training. This first step has consisted in developing and testing the tools needed to evaluate cognitive performance during the use of videogames.

In this article, we have shown a process to search, select and implement techniques for EEG signal analysis along video games. From the systematic review, there are two partial contributions: a table summarizing the most used techniques and their utilities, and a popularity ranking of those techniques. These contributions can help researchers to identify the most promising techniques to analyze EEG signals where using interactive applications. The main contribution is the library that includes those techniques, which got put through technical and functional testing. In the technical testing, we have ensured the correct output of the algorithms. The functional testing has proved the capability of the library to discern between different cognitive states, even having a very short experiment population. This functionality will be useful in future studies of cognitive training effects between a trained group and a control group. Apart from that, the experiments provided the next additional findings: the first experiment shows the higher significance of temporal lobe when we are using virtual reality, and the second experiment established the prominent presence of frustration in delta frequency bands.

Having the main goal in mind, we will use the developed library in the future to determine specific cognitive abilities being used while playing video games. For that, it would be useful to be able to quantify the level of usage of that ability to compare the initial state (before the training) and final state (after the training) of the player. We will develop one or more video games for cognitive training, and then asses them to check if they are effective.