Keywords

1 Introduction

Electroencephalography (EEG) based biometry is a growing area of research [1, 2] for user recognition in security systems, since it provides signals that can only be obtained from live individuals and varies according to different types of stimuli. The first works on the subject [3, 4] proposed the use of spectral decomposition as a measure for identification, and an accuracy of almost \(90\%\) was obtained in [4]. Since then other methodologies were applied, combining different tasks and feature extraction methods. Some combinations were able to achieve \(100\%\) of accuracy, such as the use of averaged event-related potential (ERP) from visual stimulation [5], correlation between ERPs elicited by a rapid serial visual presentation (RSVP) [6] and spectral coherence (COH) from resting-state with eyes open and closed [7]. Some effects caused by the increase in the number of subjects, their gender or modification in the age group are discussed in [8], and show that accuracy levels can strongly rely on these parameters, which indicates the need for more works with diverse populations and methodologies on the subject.

On the other hand, EEG has been one of the most used neuroimaging techniques to aid in the diagnosis of epilepsy [9]. This is mainly due to its relatively lower cost, portability and high temporal resolution, compared to other techniques. The EEG exam can detect physiological manifestations underlying epileptic activity, although only interictal epileptiform discharges (IEDs) are of clinical use [9]. In addition, it has been suggested that epilepsy interferes with functional brain networks [10,11,12,13]. These networks are characterized through functional connectivity analysis, where a similarity measure is used to compare activity time series from different brain regions (for a review, see, e.g., [14]). Recently, Nentwich et al. showed that EEG functional connectivity is subject-specific and depends on the phenotype [15]. They report that the connectivity patterns they found were more similar across tasks than across individuals, and state that “functional connectivity can be used as a diagnostic metric to assess individuals” [15].

In this context, the aim of this work was to perform an individual characterization of epilepsy patients using different connectivity measures and methodologies, and compare their performances in a biometry scenario. Magnitude-squared coherence (COH) has already been used for EEG biometry in [7], achieving \(100\%\) accuracy, and the use of other measures such as imaginary coherence (ICOH), motif-synchronization (MS) [16] and space-time-recurrence (STR) [17] are proposed here along with COH. In addition to these measures, we also vary the pre-processing steps for the signals, performing filtering in the intervals 1–40 Hz and 1–100 Hz and segmentation in epochs of 3 s and 10 s. The frequency band 1–40 Hz was chosen due to its common use in connectivity studies as it covers the low frequency bands, especially the alpha band whose alterations have been associated with epilepsy [18, 19]. The 1–40 Hz range was also used in another EEG based authentication study [7]. The second frequency band, 1–100 Hz, was chosen so that its higher frequency is close to the maximum available frequency considering the sampling rate 250 Hz. As for the epoching choices, the 10 s segmentation was used based on previous works in the area [6, 7, 20]. The smaller segmentation was chosen to be 3 s as it is the lower interval that gives more precise estimations of coherence in low frequencies, and it is recommended for security systems with a required true acceptance rate equal or higher than \(90\%\) [6]. As a result, four methodologies were applied for each measure, producing sixteen different classifiers. This is a pilot study; at this point, we did not yet investigate the association between epilepsy phenotype and EEG connectivity.

2 Subjects, Materials and Methods

2.1 Data Acquisition and Pre-processing

Scalp EEG signals were obtained from volunteer epilepsy patients undergoing pre surgical evaluation in the Neuroimaging Laboratory (LNI) at Unicamp. EEG data were acquired in resting state condition simultaneously with functional magnetic resonance imaging (fMRI) data, using a magnetic resonance (MR) compatible EEG system (BrainProducts GmbH, München, Germany), consisting of two BrainAmp MRplus amplifiers and a 64-electrode brain cap (including one electrocardiogram electrode), with electrodes positioned following the 10/10 system [21]. The sampling rate was 5 kHz, with reference on FCz and ground on AFz.

The criteria for inclusion in the study were a total acquisition time larger or equal to 600 s and a number of epileptiform events, which were marked by neurophysiologists, smaller than 30. From that, we selected signals from ten subjects (mean age \(41.9\,\pm 12.3\), 6 female). The EEG signals were collected in a single session (day) for each subject, and all the selected subjects had two or more acquisitions during the same session (trials). Table 1 shows the number of trials used for each patient and their diagnosis, which can be temporal lobe epilepsy (TLE) or frontal lobe epilepsy (FLE), and the respective affected brain hemisphere.

Table 1. Number of trials of each patient and diagnosis when available.

The steps of the pre-processing were the following: MR gradients artifact correction using MR trigger syncronism; average artifact subtraction correction; balistocardiogram correction for the scalp channels; downsampling the data to 250 Hz and discarding epileptiform events and the ECG channel, in order to retain only signals with regular brain activity. A manual cleaning and an independent component analysis (ICA) decomposition were also performed to discard noisy fragments and to remove blink components from the data, respectively. The ICA decomposition and the rejection of components were made with the fastICA algorithm and the ICLabel extension [22], both implemented in EEGLAB [23]. EEG data were then re-referenced to the average of all electrodes, and filtered in the frequency ranges of 1–40 Hz and 1–100 Hz. Since we wanted a total of 600 s of signal, we first selected 300 s from the first trial of each patient to compose the training dataset. Then, for the patients with two trials we selected 300 s from the second trial, and for the remaining patients we selected 150 s from the second and 150 s from the third trials to compose the test dataset. Finally, these segments were divided into epochs of 3 s and 10 s.

The study was approved by the ethics committee of our institution (CAAE 16715319.9.0000.5404, CEP-UNICAMP), and all subjects signed an informed consent form prior to data acquisition.

2.2 Connectivity Measures

Coherence. Coherence is a measure that quantifies the level of similarity between signals with respect to their frequency and amplitude [24]. It is a common technique to study brain connectivity from EEG signals since it gives the synchrony in a chosen specific frequency range between distinct regions of the brain. The magnitude-squared coherence between two signals i and j for a frequency f is given by the formula

$$\begin{aligned} COH_{ij}(f) = \frac{|S_{ij}(f)|^2}{S_{ii}(f)S_{jj}(f)}, \end{aligned}$$
(1)

where \(S_{ij}\) is the cross spectral density between the two signals and \(S_{ii}\) and \(S_{jj}\), the spectral density for each of them. Another common way to express coherence is using its imaginary part, which can prevent the contamination of the signals from volume conduction [25]. The expression for the imaginary coherence now depends on the imaginary part of the cross spectral density, and is given by

$$\begin{aligned} ICOH_{ij}(f) = \frac{\mathrm {Im}(S_{ij}(f))}{\sqrt{S_{ii}(f)S_{jj}(f)}}. \end{aligned}$$
(2)

To build the reference vectors, the magnitude-squared and imaginary coherence were calculated for each epoch of the first trial (\(n=30\) epochs for segments with 10 s and \(n=100\) epochs for segments with 3 s) over the frequency ranges 1–40 Hz and 1–100 Hz. The analysis was performed with Brainstorm [26], an open-source application for analysis and processing of brain recordings, with a maximum frequency resolution of 1 Hz and an overlap of \(50\%\) for power spectral density (PSD) estimation. The resultant coherence matrices were then averaged over the n epochs and over the frequencies, resulting in a vector \(2016 \times 1\) (\(2016 = [N(N-1)/2+N]\), where \(N=63\) is the number of electrodes) for each subject. For the test vectors, COH and ICOH were calculated for each epoch from the second trial or the second and third trials, and the coherence matrices were averaged only over the frequencies. This method resulted in \(n = 100\) and \(n = 30\) test vectors \(2016 \times 1\) for epochs of 3 and 10 seconds, respectively, for each subject and frequency range.

Fig. 1.
figure 1

Transformation of a randomly generated signal X into a series of motifs \(X_M\), with unitary lag.

Motif-Synchronization. The motif technique considers an original signal X as a sequence of predetermined elementary patterns that are used to transform the signal into a sequence of labels \(X_M\), as depicted in Fig. 1. This method was originally proposed to perform a study on permutation entropy in EEG data [27], and a more recent work proposed the use of motifs for a connectivity measure, called Motif-Synchronization [16]. The objective of the method is to obtain the synchrony between the signals of two sources by counting the simultaneous appearance of the defined patterns. After performing the transformation of the signal, the following variable is evaluated for each pair of sources

$$\begin{aligned} c_{XY} = \mathrm {max}\, \left( \sum _{i=1}^{L_m}J_i^{\tau _0}, \, \sum _{i=1}^{L_m}J_i^{\tau _1}, \dots , \sum _{i=1}^{L_m}J_i^{\tau _n}\right) , \end{aligned}$$
(3)

where

$$\begin{aligned} J_i^\tau = {\left\{ \begin{array}{ll} 1, &{} \quad \text {if } X_M(i) = Y_M(i+\tau ) \\ 0, &{} \quad \text {else}. \end{array}\right. } \end{aligned}$$
(4)

In the expressions above, \(L_m\) is the number of selected points from the time series and \(\tau \) is the time delay ranging from \(\tau _0 = 0\) to a maximum value \(\tau _n\) to be chosen. The connectivity matrix is then obtained from the synchronization degree of each pair, given by

$$\begin{aligned} Q_{XY} = \frac{\text {max}\, (c_{XY},c_{YX})}{L_m}, \end{aligned}$$
(5)

that can assume values between 0 (no synchronization) and 1 (maximum synchronization).

For this work, we performed the transformation of the original signals to motifs using three points patterns and unitary lag, in which the two last points of a pattern overlap with the next one (see Fig. 1). The maximum delay was considered to be \(\tau _n = 4\), corresponding to 16 ms in the data. The \(n = 100\) (for epochs with 3 s) and \(n = 30\) (for epochs with 10 s) connectivity matrices from the first trial were averaged to form the reference vector (of dimensions \(N^2 \times 1\)) and the matrices from the second trial or the second and third trials were used as test vectors.

Space-Time Recurrence. The space-time recurrence technique for connectivity is based on recurrence plots (RP) [17], a powerful tool in the analysis of complex systems that indicates the level of proximity between dynamical states. The recurrence in space and time for a pair of signals can be computed as [28]

$$\begin{aligned} STR_{ij}(\epsilon , t) = \theta \, (\epsilon - \Vert x_i(t) - x_j(t)\Vert _1), \end{aligned}$$
(6)

where \(\theta \) is the Heaviside function, \(\epsilon \) a threshold value for the distance and t the index of the sample (time). For N sources of signals, we have a \(N\times N\times T\) matrix, with T the total number of samples. To obtain a recurrence for a time period, we can define a density matrix of the form

$$\begin{aligned} Den_{ij} = \frac{1}{T}\sum _{t=1}^T STR_{ij}(\epsilon , t), \end{aligned}$$
(7)

which assumes values from 0 to 1 and gives a space-time recurrence average through time.

For this work, the density matrix (7) was used to build the reference and test vectors for classification. Since Den is symmetric, only the entries below the diagonal and the diagonal were used, resulting in vectors of \(2016 \times 1\) as in the coherence measures. For the reference vectors, the density matrices of all epochs from the first trial were averaged, and the matrices from epochs of the other trials were considered as test vectors. Although many methods for the choice of the distance threshold value have been proposed [29, 30], in this work the values of \(\epsilon \) for each case were chosen according to the best classification results.

2.3 Classification

Once the reference and test vectors were built and labeled to their respective subjects, the method of classification for all the connectivity measures was performed in the same way. First, the Euclidean distance between each of the i test vectors and j reference vectors was calculated by the expression

$$\begin{aligned} D_{ij} = \Vert \boldsymbol{t}_i - \boldsymbol{r}_j \Vert _2. \end{aligned}$$
(8)

This distance matrix has the dimensions \(n \times N_{subjects}\), where \(n = 30\) for epochs with 10 s, \(n = 100\) for epochs with 3 s and \(N_{subjects} = 10\). For every test vector, the minimum distance obtained was associated with the respective subject, and the classification results compared to the original labels. The accuracy was then given by the ratio between the number of correct classifications and n.

3 Results and Discussion

As can be seen in Figs. 2, 4, and 5, the connectivity matrices for COH, MS and STR present subtle variations that are not easily distinguishable, at least visually. The imaginary coherence maps exhibit more variety as can be seen in Fig. 3, where the maps generated by data segmented into epochs of 10 s have lower values in general.

Fig. 2.
figure 2

Connectivity matrices for the reference vector from subject 1, with values of the magnitude-squared coherence (1) (COH).

Fig. 3.
figure 3

Connectivity matrices for the reference vector from subject 1, with values of the imaginary coherence (2) (ICOH).

Fig. 4.
figure 4

Connectivity matrices for the reference vector from subject 1, with values of the degree of synchronization \(Q_{xy}\) (5) using MS.

Fig. 5.
figure 5

Connectivity matrices for the reference vector of the first subject, with values of the density matrices (7) using STR (the off-diagonal entries were rescaled from 0 to 1 for a better visualization).

The classification accuracies are presented in Table 2. It can be seen that COH, ICOH and MS vary strongly with the range of filtering chosen, with a difference of up to \(24\%\) in classification accuracy for COH. The variation for STR is less significant, but the accuracies for the filtering range 1–40 Hz are still better. These results indicate that the most relevant signals for subject distinction are contained in the lower frequency bands, including the \(\alpha \) and \(\beta \) bands which are related to relaxed awareness and concentration [31]. A more restricted filtering also provides the elimination of possible high-frequency artifacts that can harm the quality of the data.

Table 2. Classification accuracy (in percentage) obtained from the different measures. The parentheses in the STR line contain the values for \(\epsilon \) that gave the best accuracy.

As for the epoch size, the 10 s segmentation resulted in higher accuracy in the majority of the cases, producing a difference of at most \(5\%\) for MS and STR. A better performance was expected with the segmentation in 10 s, since the connectivity measures from larger periods of time are less susceptible to be disrupted by momentary movement artifacts and cognitive processes. However, some of the accuracies for 3 s were still higher, and periods longer than 10 s can be studied to verify if this improvement is relevant.

Fig. 6.
figure 6

Confusion matrices for the classifiers with COH measures. A row contains the percentage of the samples from one class attributed to each of the classes.

Fig. 7.
figure 7

Confusion matrices for the classifiers with ICOH measures. A row contains the approximate percentage of the samples from one class attributed to each of the classes.

The good performance of magnitude-squared coherence corroborates the results of [7], where high accuracies were obtained with both eyes-closed and eyes-open acquisitions. To the best of our knowledge, no other works used MS or STR for EEG-based biometry, but both measures have already been used in connectivity studies [16, 28] and generated good results. Our results for MS reveal that this measure is a good candidate to perform distinction between subjects, alongside with COH.

Fig. 8.
figure 8

Confusion matrices for the classifiers with MS measures. A row contains the approximate percentage of the samples from one class attributed to each of the classes.

Fig. 9.
figure 9

Confusion matrices for the classifiers with STR measures. A row contains the percentage of the samples from one class attributed to each of the classes.

As can be seen in the confusion matrices in Figs. 6, 7, 8 and 9, the patients 3, 4, 6, 9, and 10 have a correct classification smaller or equal to \(50 \%\) for at least one of the connectivity measures. Patient 4 has the lower hit rates in general, and is more related to patients 9 and 2 in some of the measures. The rest of the patients with worse ratings are related to different subjects depending on the connectivity measure. Alongside this, the patterns of classification seem to repeat for the same measure and filtering range, and not vary too much for the different epoch segmentation.

Relevant limitations of this work were the number of subjects whose EEG signals were appropriate for our analysis and the use of EEG signals acquired jointly with fMRI data, which have more artifacts than regularly acquired signals. However, the data used are maintained for diverse scientific purposes, which includes EEG-fMRI investigation of epilepsy patients, a goal towards which we believe this work will be useful in the future.

4 Conclusion

The approach proposed here had the intention to study different connectivity measures and methodologies for EEG-based biometry of epilepsy patients, and to compare their performances. For our subjects and method of classification, COH and MS measures obtained from epochs of 10 s extracted from the original signals filtered in the 1–40 Hz range resulted in the highest classification accuracy. We also found that STR and MS can result in classifications as good as or even better than COH and ICOH, depending on the methodology and pre-processing steps.

A first modification in the continuation of this work will be to include a larger number of subjects, which can make the results more reproducible and reliable. Other improvements include the use of more robust classification methods, exploration of the lag parameter for MS, which was held constant here, and to determine which electrodes are more relevant for classification, in order to reduce the dimensions of feature and test vectors. Finally, once we are able to increase patient sample, a future direction will be to explore the association between epilepsy phenotype and diagnosis with EEG functional connectivity.