Keywords

1 Introduction

The Electroencephalography (EEG) technique has been used for several clinical applications, particularly for diagnosis of some neurological diseases such as epilepsy [1] and sleep disorders [2]. Former studies have also shown that this technique provides information about differences between individuals related to anatomical and functional brain traits [3, 4]. More recently, the idea of using EEG signals to distinguish between different individuals aiming to implement a biometrical system [5, 6] has been explored in greater depth in several works [7,8,9], usually considering resting state acquisitions [10,11,12]. Indeed, the resting state paradigm has the benefit of enabling the use of data ac-quired from any individual, including those with restricted mobility, as well as diminishing the problem of movement artifacts, since the person must be still during the EEG evaluation.

Among the studies that aim to obtain biometric information from EEG signals are those that have used characteristics extracted from specific electrodes, and those that have explored the relationships between EEG signals obtained by different electrodes, a method known as brain connectivity. Connectivity-based approaches assume that many brain functions are executed under a specific engagement of different brain regions [13] or networks. Thus, the understanding of how these interactions take place may play a key role in providing additional information regarding the individual.

Functional connectivity (FC) is a data-driven, exploratory method, which seeks to find similarities between the dynamics of different brain regions. This method establishes relationships between the regions, which can be analyzed through graph theory [14]. For that, the graph nodes and a measure of similarity to provide the links between the nodes must be defined. For EEG data, the nodes are usually chosen to be the electrodes. In the context of using FC obtained from EEG to identify individuals, some similarity measures have already been explored, such as spectral coherence [9, 15, 16], the Spearman correlation applied to the Hilbert transform of the time series [17], the phase-locking value [17, 18], the imaginary part of the phase-locking value [19], the phase lag index or phase lag index [15], and mutual information applied to ordinal patterns [20], among others.

This work aims to analyze resting-state EEG under the perspective of graph-based measures to identify individuals. Two different FC similarity measures are used here: spatial-temporal recurrences [21] and motifs synchronization [22] based on ordinal patterns [23, 24]. It is important to stress that, to the best of our knowledge, neither type of similarity measure has yet been used in this context.

This article is divided as follows: Sect. 2 presents the EEG database used, the preprocessing steps and the two similarity methods used to evaluate the FC; Sect. 3 presents the identification results; and Sect. 4 presents a discussion about these and a brief conclusion of the work.

2 Materials and Methods

Figure 1 shows a flowchart summarizing the signal processing pipeline adopted in this work, including the chosen database, the preprocessing steps, the feature extraction approach, and the methodological analysis. All these steps are described in more detail in the following sections.

Fig. 1.
figure 1

Signal processing pipeline. The database consisted of 50 subjects with two EEG acquisitions each (R1 and R2). These were preprocessed and 1 s or 5 s epochs were extracted from the time series (four epochs from R1 and one epoch from R2). FC matrices were calculated from these epochs, using the motifs synchronization and space-time recurrences methods. A mean FC matrix from R1 was used as reference and the FC matrix from R2 was used as test. These were compared among all subjects using Pearson’s correlation coefficient.

2.1 Database and Preprocessing

The EEG data used was from the Physionet database [25, 26], in which data from 109 subjects were recorded using a 64-channel EEG BCI2000 system, with electrodes positioned following the 10–10 system (Fig. 2). The subjects performed 14 experimental runs. The first two runs were acquired in resting state, during one minute each, with eyes open (R1 – condition) and closed (R2 - condition), respectively. These runs were used in the analysis performed here.

After downloading the data in EDF format, the preprocessing was performed using EEGLAB [27], and consisted of four steps: first, removal of artifacts by simple inspection; second, decomposition of the data using Independent Component Analysis (ICA) and removal of undesired components; third, removal of alpha and power grid frequencies; fourth, Common Average Referencing (CAR) of the data [28].

In order to remove more blatant artifacts, the “Inspect/Reject data by eye” tool was used, in which sections of the signals could be marked for removal. Sections that had greater (at least five-fold) amplitude than the rest of the signal were removed.

Then, the signal was decomposed into independent components (ICA), using the tool “Decompose Data by ICA”. This tool displays the obtained independent components through scalp map projections of the EEG activity. Components related to muscle movements, eye blinks and other eye movements can be easily recognized, and were thus removed.

Next, the alpha band was removed, using a stop band filter (with the “Basic FIR Filter” tool of EEGlab, considering an interval of 7 Hz to 13 Hz, and selecting the option “Notch filter the data instead of pass band”). This was done because we wanted to compare signals obtained from eyes closed and eyes open paradigms, and this band is known to be strikingly different between opened and closed eyes signals. Finally, the signal was bandpass filtered (again with the same tool, but without selecting the option “Notch filter the data instead of pass band”) between 4 and 50 Hz, to eliminate low-frequency artifacts and high-frequency noise.

Fig. 2.
figure 2

10–10 electrode positioning system. Obtained from https://upload.wikimedia.org/wikipedia/commons/3/38/International_10-20_system_for_EEG-MCN.png. Author: Brylie Christopher Oxley.

The final preprocessing step implied in a spatial filter for re-referencing the signals using CAR [28]. This method consists in calculating the mean of the signals over electrodes and then subtracting this value from each electrode signal.

2.2 Functional Connectivity Matrices

All the database was preprocessed using the four steps aforementioned, however, only data from 50 subjects were used in this work. These subjects were selected considering the duration of the acquisitions after preprocessing. Sub-jects with acquisitions with less than 45 s were discarded.

From the R1 acquisition, four epochs were extracted, starting at seconds 10, 20, 30 and 40. From the R2 acquisition, only one epoch was extracted, starting at second 30. Lengths of 1 s and 5 s were tested for these epochs. Then, FC matrices were computed for both R1 and R2 epochs for all subjects, to be used as features in the identification problem. A template matching approach was used, in which the R1 matrices were further averaged to give one reference FC matrix per subject, while the R2 matrix was used as a test sample.

Two different similarity methods were used to compute the FC matrices: motifs synchronization [22] and space-time recurrences [21]. Both methods were implemented in MATLAB (2018, Natick, Massachusetts: The MathWorks Inc). These methods are detailed in the following.

Motifs Synchronization.

A motif series is basically a series of behavior patterns in the EEG signal. In this work, motifs with three points were used, as in Fig. 3. Thus, a temporal series of an EEG electrode can be “translated” into a motif series, according to the types of motifs in the signal.

Fig. 3.
figure 3

Three-point motifs used in this work.

The motif series of two electrodes can be used to evaluate the similarity between signals considering different lag values. In this work, a lag \(t=0\) was used. Mathematically, the similarity between the motif series of electrodes \(i\) and \(j\) can be calculated using the coefficient \({c}_{ij}\), defined as follows [22]:

$${c}_{ij}={\sum_{k=1}^{{L}_{M}}}{J}_{k}$$
(1)

where \({L}_{M}\) is the motif series length, \({J}_{k}=1\) if the motif at position \(k\) is the same in both series, and \({J}_{k}=0\) otherwise.

Then, the degree of synchronization between electrodes \(i\) and \(j\) is calculated:

$${Q}_{ij}=\frac{{c}_{ij}}{{L}_{M}}$$
(2)

which varies from 0 to 1.

With that, an \(N\times N\) connectivity matrix is obtained, where \(N\) is the number of electrodes used for the acquisition (in this work, \(N=64\)), and each element of the matrix is the degree of synchronization between the row electrode and the column electrode.

Space-Time Recurrences.

Space-time recurrences is a method used to identify whether a system has returned to a previous configuration during a given time period [29].

The space-time recurrence between two time series \({x}_{i}\) and \({x}_{j}\) is defined as:

$${STR}_{i,j}\left(\varepsilon , n\right)=\varTheta \left[\varepsilon - \left|{x}_{i}\left(n\right)-{x}_{j}\left(n\right)\right|\right]$$
(3)

The structure \(STR\) is called the space time recurrence matrix: a tridimensional data structure of \(N\times N\times {N}_{s}\), with \(N\) being the number of channels (or electrodes; in this case, \(N=64\)); and \({N}_{s}\) the total number of samples in the chosen time frame (e.g. \({N}_{s}=160\) for 1 s frames or \({N}_{s}=800\) for 5 s frames, since the sampling rate was \(160\) Hz). \(\varTheta \) is the Heaviside function, therefore: \(\varTheta \left(x\right)= 0\) if \(x<0\) and \(\varTheta (x)= 1\) if \(x\ge 0\). Finally, \(\varepsilon \) is an arbitrary distance threshold. In the present work, we chose \(\varepsilon = 50\%\) of the maximum distance (\(|{x}_{i}(n) -{x}_{j}(n)|\)) between electrode time series.

From the \(STR\) it is possible to calculate the connectivity matrix, which consists in normalizing the sum of the values of each electrode pair in the \({STR}_{i,j}\) structure:

$${A}_{i,j}=\frac{1}{N}\sum_{n=1}^{N}{STR}_{i,j}(\varepsilon ,n)$$
(4)

Thus, it is possible to reduce the dimension of the problem, since \({A}_{i,j}\) is a two dimensional \(N\times N\) matrix. Each element of the matrix describes the similarity between two temporal series of EEG.

2.3 Comparison by Pearson Correlation Coefficient

To evaluate the similarity among the signals, and thus identify a given subject, the Pearson correlation coefficient was calculated between the mean R1 (eyes open) matrix and the R2 (eyes closed) matrix of all subjects.

If the highest correlation value was for R1 and R2 of the same individual, it was possible to identify the person, because it indicated greater similarity between different acquisitions of the same person. If not, it was not possible to identify the person.

Finally, the methods were compared in terms of their hit rate, or accuracy (i.e., percentage of correctly classified individuals).

3 Results

Table 1 shows the accuracy values obtained for subject identification, for each FC method and epoch length.

Table 1. Subject identification accuracies for all combinations of functional connectivity methods and epoch lengths.

Using motif synchronization, for both epoch lengths (1 s and 5 s), 24 individuals were correctly identified among 50 analyzed, which corresponds to 48% of accuracy in both cases. Interestingly, the epoch length did not seem to influence the performance of this method.

The space-time recurrences method was able to correctly identify 18 out of 50 individuals for FC matrices computed using 1 s data, which corresponds to 36% accuracy, and for 5 s matrices it could identify 19 individuals among 50, which corresponds to a 38% accuracy. Therefore, the results using 5 s epochs to compute the FC matrices were slightly better.

4 Discussion

Regarding a comparison between methods, the motifs method achieved better accuracy than the space-time recurrences method, for all epoch lengths. This indicates that the motifs method was more capable than the recurrences method of extracting relevant information from the EEG signals regarding individual traits. The motifs method has been shown to be more efficient than other usual EEG FC methods, such as mean squared coherence and imaginary coherence, for extracting relevant information regarding interictal epileptiform discharges [30].

Nevertheless, the accuracies obtained with both methods used here, for all epoch lengths, were too low for practical purposes. Indeed, in [9], La Rocca and colleagues achieved up to 100% recognition rates with this same database, using features obtained both from power spectral density (PSD) and from coherence-based connectivity. They looked at individual electrode PSD features and individual channel (electrode pair) coherence features, and then combined the features from a given region (e.g., central, parietal or frontal) and fed them to a classifier based on the Mahalanobis distance. However, it is important to note that they only compared epochs within a given acquisition (eyes open or eyes closed); they did not attempt to use one acquisition to predict the other, as done here.

This work has several limitations. The number of subjects was low for the type of application (biometry). Notwithstanding, it is important to note that when the number of subjects is increased, the rate of accuracy decreases, since more comparisons are being made and the chance that there will be a correlation coefficient smaller than that of the right person increases. We previously tested the method with a sample of 11 subjects and the accuracies were indeed much better (64% for both methods).

The number of acquisitions per subject was also low, and additionally, the two acquisitions used did not follow exactly the same conditions, since despite both being in resting-state, one was acquired with eyes open and the other with eyes closed. Closing one’s eyes is known to increase the amplitude of alpha band oscillations in EEG signals. In a first analysis (not reported here) we attempted to use these different signals without subtracting the alpha band, but the results were worse than the ones reported here.

Also, the first preprocessing step (artifact removal by visual inspection) is somewhat subjective and may not have been exactly the same for all signals. Additionally, the STR requires adjusting the recurrence threshold for optimum FC evaluation [22] and a further detailed analysis considering a specific dataset for hyperparameter tuning outlines a natural perspective.

Nevertheless, it is important to highlight that the method presented may be taken further by exploring different options in each step of the methodology. Preprocessing could benefit from an automatic artifact removal algorithm such as SOUND [31], which would take away the subjectivity of removing signal stretches and ICA components by simple inspection. Also, other types of referencing methods, such as REST [32], could be tried instead of CAR. In the feature extraction step, graph parameters computed from the FC matrices could be explored. The motifs method could be improved by looking at delays other than zero, as in [22], while STR can be improved by means of threshold adaptations. Finally, in the classification step, a very simple classification method was used, namely, the Pearson correlation coefficient, but comparatively more sophisticated classification approaches could be investigated, such as Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) or even deep neural networks.

In conclusion, both methods of FC calculation, motif synchronization and space-time recurrences, produced results that remained below what would be considered an accurate pattern of subject identification. That said, these results were highly above the chance level (which, for 50 subjects, would have been 2%), showing that the methods have potential for this application. Also, our results were obtained attempting to match two signals acquired in different moments, while other works in the literature using similar approaches have compared only signal epochs within the same acquisition (and condition). Finally, this was a pilot study, which aimed to explore the use of two FC measures that, to the best of our knowledge, had not yet been applied to biometry studies based on EEG data.