Keywords

1 Introduction

Analyzing neural activities using Electroencephalography (EEG) plays a important role in neuroscience. It provides a non-invasive way to understand brain dynamics and pathology. Clinically, EEG is crucial in the study and diagnosis of extensive kinds of diseases such as meningitis, encephalitis and brain parasites. EEG with iconic waveform can make doctors diagnose with clinical symptoms. In the therapeutic setting, EEG can be used to identify and treat epilepsy [15], research sleep and identify insomnia. In cognitive research, EEG is used to investigate cognitive processes like attention, memory, and emotion as well as human-computer interfaces like brain-computer interfaces (BCI).

Artifacts are undesired signals that get mixed into the data collected from the recording system, which can negatively impact the quality of the EEG signal and make its analysis more challenging especially on wearable devices [16], making it difficult for doctors to identify and use [14]. It may cause difficulties in reading, which make the diagnosis difficult to determine, and even lead to diagnostic errors. Even worse, some sophisticated computer instruments cannot detect EEG precisely under the interference of artifacts artifacts. The artifacts also bring difficulties of artificial intelligence in this field [23, 26, 27].

Generally, artifacts in EEG can be categorized into two types: non-physiological and physiological. Non-physiological artifacts are caused by subjects’ misconduct, such as electrode displacement and body movement during recording. By providing proper subject instructions and experimental setup, non-physiological artifacts can be reduced [12]. Nevertheless, physiological artifacts cannot be avoided during EEG data collection. Physiological artifacts mainly refer to ocular artifacts, cardiac artifacts, and muscle artifacts [18]. One of the most common artifacts influencing the quality of EEG signals are the EOG Artifacts (EOAs), a kind of activities whose magnitude is usually much higher than that of EEG signals. Physiological artifacts can hardly be avoided during recording because they arise from the normal physiological activities of subjects.

As a result, identifying and removing artifacts, whether in clinical diagnosis or practical applications, is the most crucial prepossessing step before further analysis. Regression methods are the conventional approach for reducing artifacts from EEG [10], while Blind Source Separation (BSS) is one of the most commonly used techniques for removing physiological artifacts [8, 12]. Blind source separation is a family of algorithms aiming at separating a set of source signals S from a set of signals formed by the mixture of S without the aid of information about S or the mixing process. Independent component analysis (ICA)-based methods are most commonly used for artifact removal [4, 5] among BSS algorithms. Empirical Mode Decomposition (EMD) is another signal decomposition algorithm commonly used in EEG artifact removal and it is often combined with ICA in last decade, i.e., the EEMD-ICA [13].

EEMD-ICA is a kind of hybrid artifact removal technique [12]. The nature of EEMD allow this method to be used on both single channel and multi-channel EEG signal. Strictly speaking, EEMD-ICA is merely a tool for decomposing signals, and the most crucial step is to identify artifactual components. Therefore, this study aims at exploring the performance of different methods of identification of artifactual components under the framework of EEMD-ICA. By comprehensively testing four different ways of artifactual component identification, it is indicated the method based autocorrelation has the best performance. This study also draw a preliminary conclusion that, from the perspective of performance metrics, these artifact removal methods may have a counterproductive when SNR is high.

2 Methods

In this section, we first outline the various techniques employed in the paper and then we will describe the artifactual components identification method.

2.1 Blind Source Separation (BSS)

Blind source separation (BSS) is the one of the most used techniques to remove physiological artifacts [12, 19]. Blind source separation is a family of algorithms, aiming at separating a set of source signals S from a set of signals formed by the mixture of S without the aid of information about S or the mixing process. Let X be the multi-channel EEG signals with linear mixture S, A be arbitrary mixing matrix, then mathematically,

$$\begin{aligned} X=A S, \end{aligned}$$
(1)

in this way, an un-mixing matrix W can be generated by BBS to separate original sources,

$$\begin{aligned} \hat{S}=W X \end{aligned}$$
(2)

where is the W is the estimation of the original source.

Numerous BSS techniques, such as independent component analysis (ICA), principal component analysis (PCA), canonical correlation analysis (CCA), and Empirical Mode Decomposition (EMD), have been developed to eliminate artifacts from EEG data.

2.2 Independent Component Analysis (ICA)

Independent component analysis (ICA) based methods are dominant for artifact removal [4, 5] among BSS algorithms. ICA separates sources of signal from the raw signal and classifies them into the corresponding independent components (ICs). Raw signal can also be restored from ICs via Inverse-ICA. As shown in Fig. 1, after unmixing the raw multi-channel EEG signal into n ICs. Components which are not from neuronal activity will be rejected. Artifact- free EEG can be got via applying Inverse-ICA to remaining ICs.

Fig. 1.
figure 1

The general design Of ICA-based artifact removal method. The artifact-free signal is recovered from remaining ICs after rejecting artifactual ones.

Although ICA is a powerful tool for artifact removal, it has two major constraints. (a) ICA by its nature requires the channel number of input signal to be larger than the number of sources. If this requirement is not met, it may fail to separate the artifacts from the neural components [8]. (b) To generate reliable decomposition, ICA requires the input signal to have adequate samples. To undergo ICA decomposition on an EEG recording, it is presently recommended that the recording should has at least \(30 *( \text{ the } \text{ number } \text{ of } \text{ input } \text{ channels } )^2\) data samples [7].

2.3 Empirical Mode Decomposition (EMD)

Empirical Mode Decomposition (EMD) is another BSS algorithm used in EEG artifact removal. EMD receives single channel signal and decompose it into intrinsic mode functions (IMFs) and a residual in an iteration way:

$$\begin{aligned} x(t) = \sum ^N_{i=1}imf_i(t)+r_n(t) \end{aligned}$$
(3)

where \(r_n\) is the residual when N IMFs have been extracted. The process of extracting IMFs stop when halting requirements are achieved or target number of IMFs have been got. Compared with other signal decomposition methods like ICA and PCA, EMD is a more robust method since it has no requirements on input signal. Although EMD can be used independently [11], it is often used to expand the channel number of EEG signal. So that the EEG record with few channels can also work with ICA and CCA [3, 24].

One disadvantage of the EMD method is its susceptibility to noise, which leads to mode mixing issues [22].In the introduction of the specifics of the enhanced-EMD (EEMD) method [21], the robustness of EMD was increased by using the average of many ensembles of EMD as the ideal IMFs. In some circumstances, the remaining IMFs that have been rebuilt can be introduced into a separate environment for artifact removal to improve the quality of the EEG data.

2.4 EEMD-ICA

The idea of combining EEMD and ICA was first introduced to the task of EEG artifact removal in 2010 [13]. The research team behind this paper explored the theoretically best performance of EEMD-ICA yet their method can hardly be used in practical situation. Multiple improved methods have been put forward during past decades [1, 24] to make EEMD-ICA an automatic artifact removal method. In spite of having various variants, the general idea of this method remains unified. As shown in Fig. 2, the paradigm is concise - Decompose raw signal, reject artifactual components and reconstruct artifact-free EEG signal. The main difference between variants of EEMD-ICA lies in the rules of component rejection, in other words, the method to identify artifactual components.

Fig. 2.
figure 2

The framework Of EEMD-ICA artifact removal method. This Figure consists of four committed steps (1) Decomposition of neural data with EEMD. (2) Artifact concentration with ICA. (3) Identification and rejection of artifactual components. (4) Signal reconstruction with remaining components.

2.5 Description of Simulated EEG Data

In this paper, the generation of simulated EEG data is based on EEG and EOG segments from two public datasets [9, 25]. The EOG artifacts is considered as a combination of Horizontal EOG (HEOG) and Vertical EOG (VEOG) [6]:

$$\begin{aligned} Artifact_{EOG} = \mu x_{HEOG} + \epsilon x_{VEOG} \end{aligned}$$
(4)

where \(\mu \) and \(\epsilon \) respectively represents the contribution of HEOG and VEOG. Sufficient kinds of EOG artifacts can be generated by adjusting the coefficients, HEOG and VEOG. The artifactual EEG signal \(EEG_{Contaminated}\) is then generated by mixing up \(Artifact_{EOG}\) and \(EEG_{Pure}\):

$$\begin{aligned} EEG_{Contaminated} = EEG_{Pure} + a (\mu \cdot HEOG + \epsilon \cdot VEOG ) \end{aligned}$$
(5)

where a represents the contribution of artifact. Hence, the signal to noise ratio (SNR) of \(EEG_{Contaminated}\) can be denoted as:

$$\begin{aligned} SNR = 10\log _{10}\frac{RMS(EEG_{Pure})}{RMS(a \cdot Artifact_{EOG})} \end{aligned}$$
(6)

When \(EEG_{Pure}\) and \(Artifact_{EOG}\) are determined. The SNR of generated

\(EEG_{Contaminated}\) can be controlled by adjusting coefficient a.

3 Identification of Artifactual Components

This study aim at exploring the performance of four kinds of EEMD-ICA related artifact removal method. Characterized by the artifactual components identification methods, these approaches are denoted as EEMD-ICA\(_{kurt}\), EEMD-ICA\(_{entropy}\), EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{eogcor}\)

3.1 Kurtosis and Entropy

Abnormalities, like blinks and discontinuities, are normally characterized by a peaky distribution of potential values. Kurtosis and Entropy can capture these characteristics. EEMD-ICA\(_{kurt}\) and EEMD-ICA\(_{entropy}\) respectively use kurtosis and entropy as indicator of artifactual component. ICs with highly positive kurtosis or entropy are identified as artifacts. Similar practices were common in previous studies [3, 5, 24]. The definition of kurtosis is unique, while there are multiple types of entropy, e.g., Approximate Entropy, Sample Entropy, Fuzzy Entropy. Specifically, the entropy applied in this study is Sample Entropy [17].

3.2 Autocorrelation

Autocorrelation is used to describe the correlation degree of data itself in different periods, that is, to measure the influence of historical data on the present:

$$\begin{aligned} A C F(k)=\rho _k=\frac{{\text {Cov}}\left( y_t, y_{t-k}\right) }{{\text {Var}}\left( y_t\right) } \end{aligned}$$
(7)

With the independent variable k representing the lag, the autocorrelation function (ACF) of a signal thus reflects its correlation with itself at different lags. In accordance to previous study [2], ocular artifacts are assumed to show higher autocorrelation. As shown in Fig. 3, The ACF of artifactual components in this study has obvious features. In the proposed EEMD-ICA\(_{autocor}\) method, if the ACF of IC has higher energy, this IC is identified as artifactual component.

Fig. 3.
figure 3

An example of Autocorrelation Functions (ACFs) of ICs decomposed by EEMD-ICA. The ACF of IC 0, 2, 6, 7 obviously shows feature of tailing off to zero, indicating that these three components correspond to EOG artifact.

3.3 Correlation with EOG Reference Channel

When it comes to rejecting ocular artifact with BSS-based method, EOG Reference Channel is often introduced. Since BSS-based method can concentrate artifact into IC, it is assumed that the ICs correspond to ocular artifact have higher correlation with EOG reference channel. If EOG is not available, the EEG channel near the eyes can also be used as EOG reference channel. In a study that combine CCA and MEMD for EEG artifact removal, correlation with EOG reference channel is used to identify EOG artifacts [19]. In the proposed EEMD-ICA\(_{eogcor}\) method, correlation of each IC of this EEG segment with \(EEG_{Pure}\) is calculated. The IC having higher correlation with original signal is identified as artifactual component (Fig. 4).

Fig. 4.
figure 4

An example of EEG simulation. (a) a segment of HEOG (b) a segment of VEOG (c) An example of \(Artifact_{EOG}\) generated using Eq. 4 with \(\mu \) = 1, \(\epsilon \) = 1 (d) An example of \(EEG_{Contaminated}\) generated using Eq. 6 with SNR = -1

4 Results and Discussion

To evaluate the performance of 4 artifact removal methods, we simulated 30 groups of single-channel corrupted EEG segments and each group contained 50 EEG segments. EEG segments within the same group were controlled to share unified SNR via the method in Sect. 2.5. To measure the influence SNR could have on the performance of artifact removal methods. Data of SNRs ranging from -1 to 2 by step 0.1 (except 0) were generated for testing. To comprehensively quantify the performance, we use three kinds of performance metrics. The Normalized Mean Square Error (NMSE) is the most commonly used metric for quantifying the difference between ground truth x and predicted value \(\hat{x}\).

$$\begin{aligned} NMSE = \frac{\Vert x - \hat{x} \Vert _2^2}{\Vert x\Vert _2^2} \end{aligned}$$
(8)

In this study, x is the artifact-free data \(EEG_{Pure}\), \(\hat{x}\) is the corresponding data reconstructed from the simulated artifactual signal \(EEG_{After}\).

Fig. 5.
figure 5

The performance measures between different SNR for EEMD-ICA\(_{kurt}\), EEMD-ICA\(_{entropy}\), EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{eogcor}\), SNR of the synthetic data ranged from -1 (dB) to 2 (dB) with step 0.1 (dB). The diagrams in left column show the value of performance metrics and the diagrams in right column show the variation of performance metrics caused by applying different artifact removal approaches. (a) NMSE (b) \(\varDelta \)NMSE (c) CC (d) \(\varDelta \)CC (e) SSIM (f) \(\varDelta \)SSIM

Another two metrics are the Cross Correlation (CC) and Structural Similarity Index (SSIM) [20]:

$$\begin{aligned} CC(x, \hat{x})=\frac{Cov(x, \hat{x})}{\sigma _x\sigma _{\hat{x}}}=\frac{\sigma _{x\hat{x}}}{\sigma _x\sigma _{\hat{x}}} \end{aligned}$$
(9)
$$\begin{aligned} SSIM(x, \hat{x})= (\frac{2\mu _x\mu _{\hat{x}}}{\mu _x^2 + \mu _{\hat{x}}^2 })\cdot ( \frac{2\sigma _x \sigma _{\hat{x}}}{\sigma _x^2 + \sigma _{\hat{x}}^2})\cdot (\frac{\sigma _{x\hat{x}}}{\sigma _x\sigma _{\hat{x}}}) \end{aligned}$$
(10)

where \(\mu _x\), \(\mu _{\hat{x}}\) are local means and \(\sigma _{x}\), \(\sigma _{\hat{x}}\) are standard deviations. \(\sigma _{x\hat{x}}\) is the covariance between x and \(\hat{x}\). To better evaluate the contribution of artifact removal approaches, the variation of each metric is also taken into account. The results for every aritfactual component identification approach are presented in Fig. 5.

Among the four methods, EEMD-ICA\(_{autocor}\) has the best performance in terms of all metrics. EEMD-ICA\(_{entropy}\) is slightly weaker than EEMD-ICA\(_{autocor}\) while EEMD-ICA\(_{kurt}\) and EEMD-ICA\(_{eogcor}\) have significant limitations.

As shown in Fig. 5 (a), the EEG data reconstructed through EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{entropy}\) remains a generally low NMSE, indicating higher similarity between reconstructed EEG and \(EEG_{Pure}\). However, EEMD-ICA\(_{entropy}\) is considered to be worse than EEMD-ICA\(_{autocor}\) for two reasons. The overall NMSE of EEMD-ICA\(_{entropy}\) is higher and its NMSE curve has intersection with baseline curve. The baseline curves represents the metrics calculated from \(EEG_{Contaminated}\) and \(EEG_{Pure}\), showing the values of the metrics when we were doing nothing. Having an intersection with baseline curve indicating that after the intersection point, applying this method is worse than doing nothing in terms of this metric. This intersection is called “critical point”. For the figures in the upper row of Fig. 5, critical point is the intersection between curve and baseline curve. For the figures in lower row, critical point is the intersection between curve and horizontal zero line.

Fig. 6.
figure 6

A case of the artifactual components (ACs) detection (SNR=1.5). All four approaches were applied to the same contaminated EEG segment (SNR = 1.5). (a), (c), (e), (g) are ACs detected by EEMD-ICA\(_{autocor}\), EEMD-ICA\(_{kurt}\), EEMD-ICA\(_{entropy}\), EEMD-ICA\(_{eogcor}\). (b), (d), (f), (h) are lost neuron activities. In this case, the power of neuron activities lost by EEMD-ICA\(_{eogcor}\) is about twice as much as the power of neuron activities lost by EEMD-ICA\(_{autocor}\)

EEMD-ICA\(_{kurt}\) and EEMD-ICA\(_{eogcor}\) has a close overall performance in terms of NMSE while EEMD-ICA\(_{kurt}\) has an advantage that its NMSE critical point appears later In terms of CC, although the performance of EEMD-ICA\(_{kurt}\) and EEMD-ICA\(_{eogcor}\) continuously improves with the increase of SNR, such trend is highly deceptive. As shown in Fig. 5 (e), the curves of these two methods are totally below horizontal 0 line and keep decreasing, indicating that the two methods make negative contributions to CC under all SNR in this study. For EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{entropy}\), their performance in terms of CC are highly close. But their contribution to CC turn negative at around SNR = 1.0, indicating these two methods may not be suitable for high SNR signal.

As shown in 5 (c), (f), four approaches all cause decrease of SSIM under most SNR. This may mean that EEMD-ICA based artifact removal process inevitably lead to loss of structural information. This is probably because the ICs identified as artifacts still contains components from normal neural activities. Relatively speaking, EEMD-ICA\(_{autocor}\) performs best in terms of SSIM. As presented in Fig. 6, EEMD-ICA\(_{autocor}\) has the least neuron activities loss. Considering NMSE, CC and SSIM comprehensively, EEMD-ICA\(_{autocor}\) is the artifact removal approach of best performance among the four proposed approaches.

5 Conclusion

This study developed four EEMD-ICA based approaches to artifact rejection for noisy neural data: EEMD-ICA\(_{kurt}\), EEMD-ICA\(_{entropy}\), EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{eogcor}\). These approaches share the signal decomposition procedure while differ in terms of artifactual components identification method. The effectiveness of proposed approaches were examined with semi-simulated data.

When using NMSE as the metric, EEMD-ICA\(_{autocor}\) significantly outperformed the other two approach. It can almost be twice as good as EEMD-ICA\(_{entropy}\). When the SNR was high, the difference between the four approaches was reduced in terms of absolute NMSE value. However, all methods except EEMD-ICA\(_{autocor}\) shows negative contribution to NMSE when SNR was high. When SNR is larger than 1.0, EEMD-ICA\(_{eogcor}\) resulted worse NMSE than doing nothing while such phenomenon appeared when SNR is larger than 1.5 for EEMD-ICA\(_{entropy}\) and EEMD-ICA\(_{kurt}\). It is indicated that EEMD-ICA\(_{autocor}\) shown best versatility.

When using CC as the metric. The performance of EEMD-ICA\(_{eogcor}\) and EEMD-ICA\(_{kurt}\) were unacceptable for their negative contribution to CC under all SNR. While EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{entropy}\) had close performance. In terms of CC, these two approaches are only suitable for data with SNR being less than 1.0. When using SSIM as the metric, all the four approaches have poor performance. Although performance improved as SNR increased. These approaches contributed negatively to SSIM.

Generally speaking, EEMD-ICA\(_{autocor}\) and EEMD-ICA\(_{entropy}\) are effective in artifact removal. However, a significant drawback of these methods is that they show worse performance than the baseline when the SNR is high, limiting their scope of application to severely contaminated EEG signal.

Effectiveness of the four artifact rejection approaches had been evaluated and these approaches can act as prepossessing and improve the performance of following task. However, it is vital to estimate the SNR of data before applying these approaches because when SNR is high, these methods may have a counterproductive effect.