Keywords

1 Introduction

Epilepsy is one of the most common chronic neurological disorders, affecting approximately 50 million people worldwide [1]. This abnormal brain behavior can cause some clinical manifestations in the patients that include involuntary movements, distorted perceptions, loss of consciousness, permanent brain damage, and in the most severe cases, sudden unexpected death (SUDEP) [2, 3, 4]. Despite anti-epileptic drugs controlling many cases of epilepsy, 30% of patients still suffer from uncontrolled seizures [4]. Other alternatives to epilepsy treatment include surgical intervention and neurostimulation, but some patients are not eligible for these [5].

Epileptic seizures prediction methods are based on two signal approaches, the electroencephalogram (EEG) and the electrocardiogram (ECG) [6, 5, 2]. The most common methods based on EEG signals involve coupling electrodes to the patient’s scalp [7], which is impractical for wearable device applications. Other methods rely on surgical procedures to map the epileptic focus, and the potential treatments carry the risk of brain damage and infection [5].

Given the highlighted limitations and drawbacks of the EEG-based method and with the aim of the feasibility for portable devices available to patients, the methodology of HRV analysis emerges as a promising alternative [8]. The choice of ECG signal is immediate because it offers many advantages, such as easier acquisition, processing, and lower cost, making it suitable for detection and prediction systems and patient comfort. Further, according to the results of a recent study [9], based on an analysis that included preictal periods, there was no evidence for significant differences in HRV metrics according to epilepsy type. Therefore, this fact makes this method particularly useful when compared to EEG approaches. Figure 1 depicts the concept of a wearable device split into two parts. The first part includes a dedicated circuit designed for signal conditioning, summarized here as an analog front-end (AFE), and a heartbeat detector. Then, identified heartbeats are sent to a microcontroller such as ESP32\(^{\tiny {\circledR }}\), which contains the second part performed by the AI algorithms for seizure prediction.

Fig. 1.
figure 1

Application.

The state-of-art related EEG approach is replete with research on AI comparison to find the best algorithm for seizure prediction, and examples include [6, 10, 11, 13, 13]. To the best of the authors’ knowledge, although there are several EEG studies using machine learning models, there is still a lack of comprehensive evaluation of AI models for seizure prediction based on ECG. Among the best examples in this area are [5, 14], both studies with Support Vector Machine classifiers (SVC), and that do not consider the characteristics required for real-time operation.

This work is the sequence of the former which consisted of the design and fabrication of the integrated circuit for ECG acquisition and extraction of heartbeats for HRV applications in [15]. The main objective of this study is to show the comparison and selection of the best trained and validated classifier model suited for wearable applications relying on HRV for seizure prediction.

2 Material and Methods

Although research on EEG-based prediction is extensive, there is no consensus on the duration of the prediction phase. Figure 2 shows the classification of ECG recordings into four main intervals, as done in [16]: interictal - the signal baseline and at least 30 min before seizure onset; preictal - periods from 5–15 min before seizure onset; ictal - identified seizure onset in the dataset; and the postictal - period immediately after the seizure ends. As with EEG-based approaches, seizure prediction with ECG is based on signal phase discrimination. Thus, true alarms are the preictal periods detected within a predetermined interval [17].

Fig. 2.
figure 2

Periods in an epilepsy ECG recording. In the interictal period, an alarm triggered is considered a false alarm, whereas, in the preictal period, it is considered a true alarm.

To evaluate the performance of the supervised machine learning models for seizure prediction, each one it is have been analyzed in terms of its accuracy (ACC), sensitivity (Se), specificity (Spe), positive predictive value (PPV), negative predictive value (NPV), and false-positive rates (FPRs). For timing computing, it was used the 20-run average time for training and validation with the randomized search cross-validation technique using twenty iterations and five folds. In addition, for the sake of simplicity and since the postictal phases are not significantly different from the interictal phases, we only consider the interictal, preictal, and ictal phases in this work.

2.1 ECG Databases

The AI algorithms were evaluated with PhysioNet databases [18]. The first dataset consists of seven ECG recordings with ten identified seizures. Signals come from a heterogeneous group of patients with partial epilepsy who exhibited postictal heart rate oscillations. These recordings, sampled at 200 Hz, contained remarkable low-frequency transient signals in the 10–100 mHz. The second data set included recordings from fourteen patients admitted to the Department of Neurology and Neurophysiology at the College of Siena [19]. The patients, nine men (age 25–71) and five women (age 20–58), were monitored with video-EEG containing recordings of 1 or 2 ECG channels with a sampling rate of 512 Hz.

2.2 HRV Features

For predictions based on the HRV approach, several metrics have been extracted and supported by the state of the art. They can be divided into two groups: linear and non-linear. The main linear metrics derived from beat-to-beat intervals (a.k.a. RR-intervals) are the average of NN intervals, the root means squared difference between consecutive normal heartbeats (RMSSD), and the standard deviation of NN intervals (SDNN). Based on the Poincare plot, nonlinear metrics can be derived to quantify self-similarity in a time series, such as the NN intervals formed from identified beats. Cardiac-Sympathetic Index (CSI), Cardiac Vagal Index (CVI), modified CSI [2], and the ratio of CSI/CVI are the most commonly used nonlinear metrics. Besides the aforementioned nonlinear metrics, it was examined one of the Hjorth parameters commonly used to analyze EEG signals, the Hjorth activity (AT) [20], which reflects a time function’s variance.

2.3 Hypothesis

Following the EEG approaches to seizure prediction, the basic assumption is that the seizure does not occur suddenly, i.e., the process begins several minutes to hours before clinical manifestations (i.e., preictal phase) [17]. Based on this assumption, previous research [17, 17, 11, 16] has extracted various features from brain-behavior analysis to understand the significant and useful preictal changes in EEG signals [17]. The same approach is possible for HRV-based techniques [21, 5] by extracting the relevant metrics for seizure prediction, as addressed in the following.

  • Linear metrics: In analyzing the linear metric, we made the following assumptions based on [22]: As the frequency of heartbeats increases, both NN intervals and HRV decrease in absolute value (shorter intervals). During ictal periods, heart rate increases, indicating a decrease in HRV in relation to the interictal phase (baseline), and also a dominance of the sympathetic branch (which is activated under stress conditions) relative to vagal activity (which is activated under rest conditions).

  • Nonlinear metrics: Lower variability in ANS indicates a dominant role for the sympathetic nervous system (higher heart rate) over the parasympathetic nervous system [22]. It is reasonable to assume that CVI metrics are not significantly altered by parasympathetic activity during the preictal and ictal phases. On the other hand, CSI and modified CSI will increase dramatically at the onset of the seizure (preictal) and during the seizure (ictal) compared with baseline (interictal phase). It is expected that the Hjorth parameter will show a tendency toward the unbalancing between the sympathetic and parasympathetic branches, i.e., as the seizure onset approaches, the value of AT compared with the baseline increases.

3 Supervised Machine Learning Classifiers

To balance the estimator training step, it was used the stratification technique, which keeps the same percentage of samples for each labeled period [23]. Following the flow shown in Fig. 3, the first step begins with the acquisition of RR intervals from a low-power analog front-end (see Fig. 1), then the RR intervals are evaluated to remove ectopics and outliers beats, and then HRV features are extracted and segmented. Following the sequence of steps, the segmented feature is individually scaled based on its minimum and maximum values into zero to one range. In the last step, the models are trained/validated and tested with totally new data, and the best model in terms of computational costs, sensitivity, and false-positive rate is embedded into a low-power microcontroller.

Fig. 3.
figure 3

Flowchart.

Of the trained/validated and tested models, this work considered as criteria only those that have achieved reasonable sensitivity, i.e., the overall summed average of the three ECG periods, above 85%. This consideration allows the model to be used in real-time applications with reliably estimated reach. The AI models used in this work include Decision Tree (DT), Extra Tree (ET), Random Forest (RF), k-nearest neighbors (KNN), Adaptive Boost (ADA), Support Vector Machine (SVC), and the Artificial Neural Network (ANN).

4 Results and Discussion

The time evolution of a patient’s CSI and CVI metrics, who had a seizure at 8.13 min after the start of recording, is depicted in Fig. 4. In Fig. 4-a), the contribution of parasympathetic branch activity to cardiac regulation is higher than sympathetic branch activity around 5 min before seizure onset. From this period, it can be seen in Fig. 4-b) that there is a trend toward sympathetic predominance, which can be useful for prediction alarm systems.

Using principal component analysis (PCA) [24], it reduced the dimensionality of seventeen features to three: the average of RR-intervals, Hjorth activity (AT), and the modified CSI. The features extracted from an ECG signal from the Siena dataset corresponding to a 27-year-old female with three focal onsets of impaired awareness (IAS) are shown in Fig. 5.

The following measurements were collected by the preprocessing algorithm over a 5-min acquisition window with a sliding step of 1 s. According to the findings, as the seizure onset approaches, the heartbeat frequency tends to increase, resulting in a decrease in HRV and, as a result, RR intervals. Hjorth activity (AT) and altered cardio-sympathetic parameters, on the other hand, significantly increase as shown by the green-highlighted markers. Besides, it is possible to observe that Fig. 5-b) presents two red-highlighted markers that correspond to the modified CSI parameter non-changing, which means miss-classification by the models or to the false-positive detections.

Fig. 4.
figure 4

CVI and CSI time evolution in two cases: a) instants before seizure onset, and b) when the seizure onset. In the first case, the sympathetic activity amplitude is lower than the parasympathetic; in the second case, the sympathetic branch becomes dominant 5 min before the seizure onset.

Postictal and Siena datasets containing interictal, preictal, and ictal ECG phases were used in training and validation, totaling 97.47 h. Besides, a total of 14.08 h of new data from six patients were used to test the models. In the last test evaluation, false-positive rates were calculated using 31.17 h of interictal data from the MIT/BIH ECG dataset. Using Python Scikit Learn library [23], the following model configurations were used to achieve the results shown in Table 1.

  • Decision Tree (DT): class weight: “balanced”, criterion: “entropy”, maximum depth: 64, minimum samples leaf: 4, minimum samples split: 16;

  • Extra Tree (ET): criterion: “entropy”, maximum depth: 30, minimum samples leaf: 2, minimum samples split: 16;

  • Random Forest (RF): bootstrap: “false”, maximum depth: 20, minimum samples leaf: 2;

  • K-Nearest Neighbours (KNN): algorithm: “kd_tree”, leaf size: 40;

  • Adaptive Boost (ADA): learning rate: 0.8, number of estimators: 32, base estimator: Decision Tree;

  • Support Vector Machine Classifier (SVC): C: 100, class weight: “balanced”, coef0: 1, decision function shape: “ovo”, degree: 10, kernel: “poly”;

  • Artificial Neural Network: weight constraint: 5, loss: “categorical_crossentropy”, epochs: 200, dropout rate: 0.0, batch size: 100, activation: “softmax”. Built with two hidden layers of 128, and 64 perceptrons by using “relu” activation function.

Fig. 5.
figure 5

Normalized features extracted from a patient with two IAS. a) RR-intervals average, b) Modified CSI, and c) Hjorth activity (AT).

According to Table 1, it is possible to draw the following conclusions: ANN was the most accurate classifier able to recognize preictal periods with 100% sensitivity, as well as negative predictive value (NPV), which means the likelihood of 100% of a patient not being in the preictal period. These results are especially interesting since can be applied to wearable alarm systems, which may alert the patient in the event of a predicted seizure. Otherwise, i.e., if no alarm arises, the patient can keep living normally.

In terms of specificity, i.e., the proportion of data classified as non-preictal among those which are not, the best model was ADA with 98.66%. ADA also achieved the lowest FPR, just 0.013, and the highest positive predictive value (PPV) 97.06% for preictal periods among all models when considering both datasets. Despite the interesting results, ADA has missed nearly 11.6% of the preictal periods compared to ANN which may be a problem for patients who has several seizures in short periods. Still analyzing the results, the SVC model reached the best accuracy of 97.57% for preictal period identification. Despite the second-worst time cost, SVC achieved the second-highest Sen, Spe, and NPV scores, 97.70%, 97.51%, and 98.83%, respectively.

Table 1. Validation results for different predictor AI models.

This set of features turns the SVC into the best classifier for seizure prediction considering the datasets used. Further, in wearable applications, the SCV model can be trained in off-line mode or with cloud computing and then embedded into a microcontroller, thereby mitigating the problem of time costs.

5 Conclusion

This study compared supervised machine learning algorithms for predicting epileptic seizures. Three metrics extracted from ECG were used to train-validate, and test several models. Also, three databases were investigated for this purpose, yielding more than 100 h of data. According to the results, by considering only preictal periods, the best model was SVC which obtained the highest accuracy, 97.57%, the second-highest Sen, Spe, and NPV scores, 97.70%, 97.51%, and 98.83%, respectively. Based on the shown results, refractory patients to the conventional treatments may benefit, increasing their life’s quality through wearable devices with embedded trained SVC models to predict seizures.