1 Introduction

EEG is an electrophysiological monitoring technique for recording brain electrical activity. It is commonly utilized in neuroscience, clinical research, cognitive science, neurolinguistics, and psychophysiological studies. EEG signals provide valuable information, and features generated using this help diagnose the presence or absence of neurological disorders. Similarly, EEG signals are also used extensively in brain–computer interface (BCI) applications to predict certain movements/activities the user performs. To assure the correctness of modern BCI systems, EEG signals that are not compromised of anything other than desired cerebral activity are desired. In practice, EEG signals are contaminated significantly by undesirable sources such as electrooculogram (EOG) artifact, electromyogram (EMG) artifact, electrocardiogram (ECG) artifact, and power line noise [1]. Thus, removing these distortions is an essential step in pre-processing EEG signals. Among different artifacts, EOG produced by eye movement is the most common artifact and is the primary consideration in this work. EOG artifacts are undesirable high-amplitude, low-frequency patterns in EEG signals caused by eye blinking and movement, while the EEG is being recorded. Many research works have been proposed and implemented to remove EOG artifacts from single- and multi-channel EEG signals, such as regression, blind source separation (BSS), singular spectrum analysis (SSA), and hybrid methods. Independent component analysis (ICA) and its variants, like wavelet-enhanced ICA (wICA), are the most widely used BSS approach [1,2,3]. The main drawback of using ICA is that it does not ensure that artifact-related independent components (ICs) will only contain noise and no neuronal activity, especially in the presence of channel noise where the numbers of source signals are essentially doubled while the number of sensors remains the same. Since ICA can only detect sources equal to or less than sensors, ICA components will contain a mixture of noise and brain activity. This issue is minimized by incorporating wavelet transform with ICA in methods like wICA. However, the performance of wICA greatly depends on the chosen wavelet basis function. For this reason, ICA-based approaches cannot be used in a real-time BCI system. The traditional method for EOG artifact rejection is the regression method in which separate vertical EOG (VEOG) and horizontal EOG (HEOG) channels are required to record VEOG and HEOG signals, respectively. In general, two electrodes positioned above and below the eye are used to calculate the VEOG component, whereas two electrodes placed outside the outer canthus of each eye are used to calculate the HEOG component [3]. The recorded VEOG and HEOG are regressed with contaminated EEG to obtain clean EEG. Regression methods require four extra electrodes/sensors and suffer from bidirectional interference. In addition, a higher level of knowledge is required to attach these four extra electrodes properly, and any electrode pop or manufacturing fault in the reference channel might introduce noise into the data of other EEG channelsIn [4], the authors have suggested that virtual/simulated reference channels (FP1 and FP2 channels from the prefrontal region) can be an option if adopting an actual reference channel is not viable. As these channels (FP1 and FP2) also contain EEG activities, using them directly as reference channels can introduce additional noise. Therefore, this paper proposes a singular spectrum analysis (SSA)–non-negative matrix factorization (NMF)-based ocular artifact removal (SNOAR) method, by combining SSA and NMF for VEOG and HEOG calculation. To the best of the authors’ knowledge, the combination of SSA and NMF is first time adopted to estimate VEOG and HEOG components using frontal EEG electrodes. The main contributions of this paper are as follows:

  1. 1)

    SSA was used to estimate EOG artifacts from the selected channels from the frontal region (FP1, FP2, F7, and F8).

  2. 2)

    NMF-based decomposition of the estimated EOG artifacts into VEOG and HEOG components.

  3. 3)

    With the help of estimated VEOG and HEOG components, the artifact-free EEG was obtained using the time domain regression approach.

The performance of the proposed method was evaluated and compared with selected EOG artifact removal methods using EEG records/signals from Klados dataset [5] and KARA ONE dataset [6]. The simulation results indicate that the proposed method outperforms the selected EOG artifact removal methods in terms of root-mean-square error (RMSE) and delta band energy ratio. In addition, less power spectral density (PSD) difference was observed between the clean EEG and filtered version of the contaminated EEG segment obtained after the proposed method.

The organization of this paper is as follows: Sect. 2 presents the review of some of the significant research works on EOG artifact removal. The fundamentals of the proposed EOG artifact removal method and its mathematical backgrounds are reported in Sect. 3. Simulation results and discussion are presented in Sect. 4. The conclusion and future works are discussed in Sect. 5.

2 Previous works

Many research works have been proposed and implemented to remove EOG artifacts from single- and multi-channel EEG signals. This section reports some of the significant research works available in the literature. It is divided into two subsections: multi-channel and single-channel approaches. As most multi-channel approaches cannot work for applications where only a single electrode is present, a separate subsection for a single-channel scenario is added.

2.1 Multi-channel ocular artifact removal

Various methods such as regression, blind source separation, variational mode decomposition (VMD), empirical mode decomposition (EMD), SSA and their hybrid approaches, and wavelet transform are available for multi-channel ocular artifact removal.

2.1.1 Regression methods

Regression method is a traditional method for ocular artifact removal. This method requires two additional reference channels, VEOG and HEOG. The artifact-free EEG data were obtained from contaminated EEG signals by removing artifacts calculated using VEOG, HEOG data, and simple linear regression [3, 7].

2.1.2 Filtering methods

Ocular artifacts can also be removed using filtering methods such as adaptive filtering/recursive least square (RLS) filtering and Weiner filtering. Adaptive filtering is based on the assumption that signal and artifact are uncorrelated [8]. In this method, an artifact-correlated signal was first generated with the help of a filter and reference channel and then subtracted from the EEG channel. Weights were produced iteratively during adaptive filtering to capture the effect of artifact on the EEG signal at each time instant. Adaptive filtering computes varied weights at separate time incidents instead of a constant weight assigned to artifact for each electrode in a linear regression-based technique. Weiner filtering eliminates the requirement for a reference signal. It is based on a statistical method that generates a linear time-invariant filter that reduces the mean square error between the intended signal and its estimate [9]. The minimization is done using an estimation of the PSD's of the signal and artifact; hence, it does not need a reference waveform.

2.1.3 BSS methods

Some popular BSS-based artifact removal methods are principal component analysis (PCA), ICA, and canonical correlation analysis (CCA). PCA used variance to determine eye blink- and eye moment-related components to obtain artifact-free signals [1]. In order to utilize PCA to eliminate artifacts, the artifact and EEG must be uncorrelated, but this requirement is difficult to achieve in a real-world context. The ICA approach assumes that the EEG output at each electrode point is a mix of artifact and pure EEG signals. In ICA approach, EEG signals were decomposed into independent components, and then parameters such as kurtosis, dipole energy, and entropy were calculated to estimate ICs, which have artifacts. To achieve artifact-free EEG readings, these ICs were eliminated, and other ICs were recombined [1]. It has recently become more common to combine ICA and wavelet-based processing, such as wavelet-enhanced ICA (wICA) [10]. In wICA-based method, ICA was performed on EEG data, and then obtained ICs were passed through wavelet de-noising to achieve artifact-free data. Another method, called wavelet-ICA (WICA), first partitioned EEG data into sub-bands using wavelet transform and then applied ICA on the selected artifact-related wavelet components. Obtained ICs linked to the artifacts were selected and cancelled [11]. Finally, clean artifact-free EEG signals were attained after performing inverse ICA and wavelet reconstruction. The main limitation of these techniques is their dependency on the selected wavelet basis functions. CCA is another method for removing artifacts that calculates components from uncorrelated sources. This approach is less time-consuming than the ICA-based approach.

2.1.4 EMD-based methods

Bivariate empirical mode decomposition (BEMD) and multivariate empirical mode decomposition (MEMD) are widely used for ocular artifact removal [12, 13]. The BEMD-based approach had two stages. The first stage filtered electrical and environmental noise from the EOG channel by applying BEMD on fractional Gaussian noise and EOG signal. This filtered EOG signal was used as a reference in second stage BEMD with each EEG channel separately to achieve artifact-free EEG signals. In the MEMD-based artifact removal, the data from EEG channels, a reference channel (fractional Gaussian noise), and EOG channel were decomposed using MEMD. After that, a threshold intrinsic mode function (IMF) was chosen based on the energy and time period criteria related to the reference channel. All the IMFs, having higher order than threshold IMF, were considered EOG artifacts and removed from the recorded EEG signals.

2.1.5 NMF-based methods

Damon et al. [14] used a combination of NMF and Wiener filtering for ocular artifact removal. In this method, the short-time Fourier transform (STFT) matrix related to the EEG channels was decomposed into weight and base matrix using NMF decomposition. The base matrix entries were initialized with the help of the reference EOG channel. Once this NMF model was established, the artifact and decontaminated EEG signals were easily reconstructed through Wiener filtering. Another method [15] combined ensemble EMD (EEMD) with NMF. In this method, NMF was used to divide the normalized EEG data into components. By using fractal dimension, the components with ocular artifacts were automatically selected. Following that, EMD adaptively divided these components’ temporal activity into a few intrinsic mode functions (IMFs). Ocular artifact-related IMFs were eliminated, and the de-noised EEG data were finally rebuilt.

2.2 Single-channel artifact removal

Source decomposition methods such as SSA, EMD, PCA, wavelet transform, VMD, NMF, and hybrid approaches such as EEMD-PCA, EEMD-CCA, and SSA-ICA are popular among researchers for single-channel artifact removal.

2.2.1 Wavelet transform-based methods

In discrete wavelet transform (DWT)–adaptive noise canceller (ANC) approach, reference EOG artifact was determined using DWT. This reference EOG was applied to ANC to filter the contaminated EEG.

2.2.2 SSA-based methods

SSA is a subspace-based technique that gathers multidimensional information from single-channel EEG data. Some artifact removal methods using SSA are SSA-ANC, SSA-ICA with wavelet thresholding, SSA-K mean, and overlap segmented adaptive SSA (OvASSA) with ANC. In SSA-ANC [16] approach, a reference signal was calculated using SSA with the help of local mobility of Eigenvectors. Then this reference signal and EEG channel data were applied to ANC to extract artifact signal. Finally, artifact-free EEG signal was obtained by subtracting the estimated artifact signal from the corrupted signal.

In the OvASSA-ANC approach [17], the overlap SSA process was first applied to EEG data. Then, depending on amplitude, one or two reconstructed components were adaptively grouped and considered as reference EOG for ANC. As ICA is unsuitable for single-channel EOG artifact removal, in [18], a hybrid approach based on SSA and ICA was proposed. In this work, SSA decomposes single-channel data into multidimensional signals. Then ICA was applied to this multidimensional data. ICs that contained artifacts were then corrected using wavelet thresholding.

In the SSA-K means-based method [19], EEG signal was first converted into multivariate data using embedding. Then time domain features and clustering were used to distinguish these multivariate signals into different clusters. Source signals created using signals of different clusters were divided into the artifact and the EEG signal. The artifact’s position in the signal was interpreted using the binary template created by this artifact data. Then SSA-based analysis was applied only at those positions to remove the artifact component.

2.2.3 EMD-based methods

Another popular method of single-channel artifact removal is EMD and its variant. In EEMD-PCA [20]-based approach, firstly, portions of EEG signals that contain eye blink were captured using windowing and time domain parameters such as variance, and skewness, and subsequently, EEMD was applied to only those portions. Next, PCA was performed on IMFs, and principal components that contain the artifact were selected and removed. Another approach, named EEMD-ICA [21], used EEMD decomposition to convert single-channel data into multidimensional data. Then, ICA was applied, and ICs corresponding to the artifacts were removed. In EEMD-CCA [22] approach, after applying EEMD decomposition, IMFs that contain artifact were selected, and then CCA was applied to those IMFs. Source signal, which corresponds to artifact, was set to zero.

2.2.4 Other methods

In VMD [23]-based ocular artifact removal method, firstly, the epoch corresponding to artifact was calculated using multiscale modified sample entropy (mMSE). After that, VMD was applied, and band-limited IMFs (BIMF) were obtained using a predefined parameter that helps BIMF capture ocular artifacts. Finally, a regression-based approach was used to capture a clear EEG signal. Another recent approach called Fourier Bessel series expansion-based empirical wavelet transform (FBSE-EWT) [24] used FBSE-EWT for obtaining δ, θ, α, β, and γ rhythms. After that, enhanced local polynomial (LP) approximation-based total variation (TV) filter was applied to δ rhythm data to calculate LP and TV components. These components were subtracted from δ rhythm, and then this filtered δ rhythm was added to all other rhythms data to construct an artifact-free EEG data.

From the previous works on EOG artifact removal from multi-channel EEG signals, it can be observed that BSS-related approaches were unable to distinguish the artifact perfectly, and therefore, a large amount of useful information was removed. It has also disadvantages, such as a relatively high computational complexity and requires proper identification of blinking components. Wavelet transform-based EOG artifacts rely on the choice of mother wavelet, decomposition level, and threshold. EMD-based methods are susceptible to noise and thus face the issue of mode mixing. The performance of the regression-based methods was affected by cross-contamination between EEG and EOG. In single-channel-based works, it is observed that SSA and its variant provide satisfactory performance in ocular artifact removal.

3 Materials and method

3.1 Description of dataset

The proposed technique was evaluated using the EEG signals from the semi-simulated Klados dataset and the real KARA ONE dataset. Klados dataset consists of EEG recordings of 27 subjects (males and females) recorded using the standard 19 channels/electrodes (FP1, FP2, FP3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, FZ, CZ, PZ). In this dataset, odd index electrodes were referenced to the left mastoid, even index electrodes to the right mastoid, and center electrodes to the right and left mastoid average. EEG signals were recorded with a sampling frequency of 200 Hz, and VEOG and HEOG components were also recorded separately during each recording session. KARA ONE dataset includes the real EEG signals of 12 participants recorded using a 64-channel Neuroscan headset with a sampling frequency of 1 kHz. Only EEG signals recorded during imagination of phonemes /uw/, /tiy/, /pat/, and /n/ were used in this work. All signals were down-sampled to 250 Hz and band-pass filtered between 0.1 and 45 Hz. A common average reference was also used.

3.2 Proposed method

This study proposes a new method, SNOAR, for removing EOG artifacts from multi-channel EEG signals. The proposed method does not require extra electrodes/sensors to record EOG signals and uses information of frontal channels (FP1, FP2, F7, and F8) for artifact removal. SSA is a prominent source decomposition method that can separate sources even if they overlap in temporal frequency space. The previous works on SSA and its variations for single-channel artifact removal reveal that it is effective in separating artifacts from cerebral activity. Estimating EOG artifacts using SSA from each channel is time-consuming; hence, only four channels close to the left and right eyes were selected in this work. Subsequently, NMF was employed to separate VEOG and HEOG components from the estimated EOG artifacts using SSA. Finally, the estimated VEOG and HEOG components were used as a reference, and the time domain regression method was used to obtain clean artifact-free EEG signals. For estimation of regression coefficients, least square estimation method was used. The proposed method's overall flowchart and its subsections are depicted in Figs. 1, 2, and 3.

Fig. 1
figure 1

Flowchart of proposed method

Fig. 2
figure 2

SSA-based artifact calculation

Fig. 3
figure 3

VEOG and HEOG component calculation using NMF

Pseudocode:

Input: Contaminated EEG time series data (\({EEG}_{contaminated})\)

Output: Artifact-free EEG data (\({EEG}_{true})\)

  • 1. Select 4 electrodes nearest to the eyes.

  • 2. Calculate Artifact present in these electrodes using SSA.

  • 3. Arrange these artifacts in one matrix (Q).

  • 4. Calculate the highest negative number (m) present in this matrix and add that number to the complete matrix to make it positive.

  • 5. Perform NMF decomposition of positive Artifact Matrix into W (4 × 2) and H (2 × N) matrix.

  • 6. Construct a matrix P (4 × N) whose all values are m.

  • 7. Calculate matrix H2 = [Pseudo inverse (W) × P].

  • 8. Calculate 2 EOG components using W, H and H2 matrix.

  • 9. Calculate artifacts present in other channels using EOG components and linear regression.

  • 10. Subtract artifact present in each EEG channel to get artefact-free cerebral activity.

3.2.1 Estimation of EOG artifact using SSA

SSA is a subspace-based technique that incorporates elements of multivariate statistics, multivariate geometry, signal processing, and classical time series analysis. Four channels having dominant VEOG and HEOG characteristics were selected for estimating artifact. SSA comprises 5 main stages: embedding, decomposition, grouping, diagonal averaging, and reconstruction [16, 25].

In the embedding step, a multidimensional matrix (Y) was created using J number of L length, lagged vector of single dimension data. If X is a contaminated single-channel EEG data of N length, then the embedding matrix is

$$\left[\begin{array}{cc}X(1)& X(2)\\ X(2)& X(3)\\ \vdots & \vdots \\ X(L)& X(L+1)\end{array} \begin{array}{cc}\cdots & X(J)\\ \cdots & X(J+1)\\ \vdots & \vdots \\ \cdots & X(N)\end{array}\right]$$

where J = N – L + 1 and L is a window length. The value of L is determined by the following criteria:

$$L\ge \frac{{f}_{s}}{f}$$

where f is the lowest frequency of interest (minimum frequency which needs to be taken into consideration) and fs is the sampling frequency.

After embedding, decomposition of trajectory matrix (Y) into L weighted orthogonal matrices is performed using single value decomposition (SVD). According to SVD

$$X=U.\Sigma .{V}^{T}= \sum_{i=1}^{R}{X}_{i }with {X}_{i}= {\sigma }_{i}.{u}_{i}.{v}_{i}^{T}$$
(1)

where ui and vi are left and right singular vector matrices and \(\sigma\) i is singular values corresponding to U, V, and matrix.

The next step in SSA decomposition is grouping. In this step, Xi matrices were grouped using some predefined criteria. In this work, local mobility of the Eigen vector (ui) was used as a criterion for grouping [16]. Local mobility mf of ui matrix is computed using Eq. 2:

$${m}_{f}= \frac{\sqrt{\frac{\sum_{j=1}^{L-1}{z}^{2}(j)}{L-1}}}{\sqrt{\frac{\sum_{j=1}^{L-1}{u}_{i}(j)}{L-1}}} where z\left(i\right)= {u}_{i}\left(j\right)-{u}_{i}\left(j-1\right)$$
(2)

After calculating mf for each vector, if its value is larger than 0.1, then the corresponding Xi matrices are considered in the group artifact. Matrices of artifact group are added, and in the final step, this multidimensional matrix representing artifact is converted back into single dimension using diagonal averaging [16].

3.2.2 Estimation of HEOG and VEOG components using NMF

After estimating EOG artifacts from four selected channels using SSA, a matrix Q of (p × N) dimension is created using this data, where represents the number of channels used for artifact calculation and N represents the number of samples. In the next step, NMF is used for separating HEOG and VEOG components.

NMF is a commonly used algorithm in the field of multivariate analysis which can decompose data of any positive matrix A into the product of two lower dimension weight (W) and base (H) matrices by optimizing function. Multiplicative, gradient descent, and alternating least squares (ALS) are fundamental algorithms to solve this optimization problem. In this paper, the ALS algorithm was used in NMF decomposition, and its pseudocode is given below:

figure e

The primary constraint of NMF decomposition is the non-negativity of the input data matrix, and the matrix Q created using artifacts had positive and negative values. The negative values were transformed to non-negative by adding a constant value m (largest negative number of matrices), and a positive matrix A was obtained. Then ALS-based NMF method [26] was used with hundred iterations to decompose this A (4 × N dimension) matrix into W (4 × 2 dimension) and (2 × N dimension) matrices. Hundred iterations were chosen because the ALS algorithm converges in less iteration. As W and H were calculated from the non-negative matrix (Q + m), the H matrix rows were amplitude-shifted versions of the original base matrix, which can evidently be seen in Fig. 4.

Fig. 4
figure 4

Reference VEOG and HEOG components as well as VEOG and HEOG components calculated through constant addition with and without eliminating amplitude shifting using H2 matrix

In order to eliminate the effect of amplitude shifting, VEOG and HEOG components were calculated using Eqs. 3 and 4:

$$EOG1=W\left(\mathrm{1,1}\right)\times \left(H\left(1,:\right)-H2\left(1,:\right)\right)$$
(3)
$$EOG2= W\left(\mathrm{1,2}\right)\times \left(H\left(2,:\right)-H2\left(2,:\right)\right)$$
(4)

where H2 = [Pseudo inverse (W) \(\times\) P] and P is a 4 × N dimension, whose all entries are m. One of the components from EOG1 and EOG2 represents VEOG and other HEOG. The EOG1 and EOG2 components computed from Eqs. 3 and 4 are illustrated in Fig. 4. The figure shows that EOG1 and EOG2 have the same shape as original HEOG and VEOG components, respectively.

3.2.3 Regression

It is a traditional method that assumes that each channel is the cumulative sum of pure EEG data and a proportion of artifact. This work estimates VEOG and HEOG components using a combination of SSA and NMF. Afterwards, artifact-free EEG signal was obtained by subtracting the contaminated EEG from the estimated VEOG and HEOG components using a linear regression equation given in Eq. 5 [2]:

$${EEG}_{true}= {EEG}_{contaminated}- {B}_{v}\times EOG1- {B}_{h}\times EOG2$$
(5)

where Bv and Bh are propagation coefficients for EOG1 and EOG2, respectively, and these coefficients were calculated using the following equation [25]:

$${B}_{v} = \left(\frac{{r}_{cv}- {r}_{ch}.{r}_{vh}}{1-{r}_{vh}^{2}}\right)\times \left(\frac{{sd}_{c}}{{sd}_{v}}\right)$$
(6)
$${B}_{h}= \left(\frac{{r}_{cv}- {r}_{cv}.{r}_{vh}}{1-{r}_{vh}^{2}}\right)\times \left(\frac{{sd}_{c}}{{sd}_{h}}\right)$$
(7)

where rcv, rch, and rvh are correlation between EEGcontaminated data and EOG1, correlation between EEGcontaminated data and EOG2, correlation between EOG1 and EOG2, respectively, and sdc, sdv, and sdh are standard deviation of EEGcontaminated, EOG1 and EOG2, respectively.

3.3 Performance measure

To validate the performance of proposed method, the following performance metrics are used.

3.3.1 RMSE

For simulated data, RMSE between true EEG data and artifact-free EEG data achieved using the prescribed artifact removal method was calculated using the following equation [10]:

$$RMSE\left(g\right)= {\left(\frac{\sum_{i=1}^{N}{\left(EEG\left(g,i\right)-AFEEG(g,i)\right)}^{2}}{S}\right)}^\frac{1}{2}$$
(8)

where EEG = true EEG, AFEEG = artifact-free EEG data calculated using artifact removal method, S = number of samples, and g defines channel number.

3.3.2 Energy ratio of delta rhythms (ER δ )

For real EEG dataset, this parameter was calculated for artifact-removed as well as contaminated EEG signal using the following equation:

$${ER}_{\delta }\left(\%\right)= \frac{{E}_{\delta }}{{E}_{Total}}.100$$
(9)

where Eδ is the energy of delta rhythm and ETotal is the total energy of EEG signal [24].

4 Results and discussion

In this section, the effectiveness of the proposed SNOAR method for ocular artifact removal was tested on semi-simulated and real datasets. A comparison of the proposed method with existing methods such as ICA, wICA, improved wICA, and MEMD was also carried out.

For the implementation of wICA [10], FastICA-based decomposition was applied to the EEG data matrix. A mixing matrix and independent components were obtained. Each IC was decomposed into 5 levels using “sym4” wavelet basis function. After performing thresholding on wavelet components, inverse wavelet transform was carried out to combine independent components consisting of artifact-free neural sources. Finally, remixing of ICs provided artifact-free EEG signals. The implementation of improved wICA was done as in [27]. Similar to wICA, EEG signals were first decomposed into independent components. An automatic EOG component identification was performed based on the correlation of each IC with frontal channels. Next, these identified components were searched for segments having EOG peaks based on the specified amplitude (peak amplitude ≥ 3 × mean amplitude of IC component) and time constraint (at least 0.5-s duration between 2 peaks). The segments with EOG peaks were decomposed into 5 levels using “sym4” basis, and only high-frequency components were retained for signal reconstruction. Finally, an inverse ICA process was performed to obtain artifact-free signals. For rejection ICA-based method [27], ICs consisting of artifacts were identified using visual inspection and scalp maps with the help of the EEGLAB software, and other ICs were remixed to provide artifact-free data.

MEMD-based artifact removal was implemented as presented in [12]. MEMD decomposition was firstly applied on a matrix consisting of EEG channels data, reference EOG (VEOG), and reference signal data (fractional Gaussian noise (fGn) with Hurst index of 0.7, mean = 0, standard deviation = 1). With the help of IMF’s energy distribution of fGn signal and EOG channel, a reference IMF was selected. Next, IMF belonging to the EEG channel having the lowest difference in the mean time period from reference IMF was considered threshold IMF. The artifact-free EEG channel was obtained by adding all IMFs with an index less than the threshold.

4.1 Simulation results using semi-simulated data

Figure 5 shows the contaminated EEG signals recorded from FP1 and F7 channels/electrodes and respective VEOG and HEOG signals. It can be seen from Fig. 5 that EEG recordings from FP1 and F7 channels are contaminated by VEOG and HEOG components/signals. It is due to the impact of vertical eye movement on the prefrontal channels (FP1 and FP2), which are close to the left and right eyes; similarly, the influence of lateral eye movement is visible in the channels (F7 and F8), which are close to eyes in the frontal region. Hence, these four electrodes/channels were considered reference channels for estimating EOG artifacts using SSA.

Fig. 5
figure 5

a FP1 channel contaminated from Klados dataset 1. b F7 channel contaminated from Klados dataset 1. c HEOG data. d VEOG data

For the Klados dataset, the window length was chosen based on the criteria L > fs/fm and was set to 400; fs was set to 200 Hz, and fm was set to 0.5 Hz (because EOG artifacts are present in the 0.5–5 Hz frequency range). Estimated EOG artifacts from FP1 and F7 channels of subject 1 (Klados dataset) along with actual artifacts are shown in Fig. 6.

Fig. 6
figure 6

Comparison of real artifact present in FP1, F7 channel of Klados dataset1, and artifact calculated through SSA (a). Real artifact presents in FP1 channels (b). Calculated artifact using SSA in FP1 channel (c). Artifact present in F7 channel (d). Calculated artifact using SSA in F7 channel

The effectiveness of SSA-based estimation of EOG artifacts from the contaminated multi-channel EEG signals can be seen in Fig. 6. As reported in [12] and from the simulation results present in this work, SSA is a suitable candidate for removing EOG artifacts. From the estimated EOG artifact using SSA, VEOG and HEOG components were separated by NMF. The estimated VEOG and HEOG components are reported in Fig. 7. The figure shows the satisfactory performance of NMF in separating VEOG and HEOG components from EOG artifacts.

Fig. 7
figure 7

a Actual VEOG from Klados dataset. b VEOG component calculated through SNOAR. c Actual HEOG from Klados dataset. d HEOG component calculated through SNOAR

After separating VEOG and HEOG components, a time domain linear regression approach was used to obtain clean EEG signals from the contaminated EEG signals. Regression coefficients were estimated using the least square estimation method.

In this work, the recordings only from the frontal electrodes (FP1, FP2, F7, and F8) of EEG were used to estimate EOG artifact because they are most likely to be corrupted by EOG artifacts. The performance of the proposed method in terms of RMSE for the Klados dataset was tabulated in Tables 1 and 2. In the SSA-based EOG artifact removal method, RMSE values were calculated using SSA to estimate EOG artifacts individually on all the channels and remove them from contaminated EEG. In the proposed method, VEOG and HEOG were estimated and subtracted from the all-contaminated EEG channels using a linear regression approach. The average execution time to remove the artifact from all channels using SSA-based method was 544.7 s, whereas the proposed method takes only 134.71 s. From the average execution time, it can be said that the proposed time takes less computational time to estimate EOG artifacts and remove them from the contaminated EEG signal. From Table 1, it can be inferred that the proposed EOG artifact removal method performed better in comparison with SSA in terms of RMSE for the EEG signals from the selected subjects.

Table 1 Values of RMSE for SSA and SNOAR method using Klados dataset
Table 2 Values of RMSE for contaminated data, rejection ICA, wICA, MEMD, improved wICA, and SNOAR method using Klados dataset

The performance comparison of the proposed method with selected EOG artifact removal methods (ICA, MEMD, wICA, and improved wICA) was reported in Table 2.

The RMSE results of contaminated EEG, wICA, and improved wICA were taken from the work published by M. F. Issa and Z. Juhasz [27], and MEMD was implemented using the work of M. K. I. Molla [13]. The common average referencing and filtering (1–47 Hz) were performed in the improved wICA method before artifact removal. So, the result of the proposed method with and without filtering/common average referencing was calculated. It can be observed from Table 2 that the proposed SNOAR with common averaging and filtering yielded lower RMSE compared to selected EOG artifact removal methods.

In order to study any change in spectral characteristics of the signal after artifact removal, PSD graphs of the EEG signals before and after artifact removal were studied and reported in Fig. 8.

Fig. 8
figure 8

PSD graph of contaminated EEG and artifact-free EEG obtained after the proposed methods and selected methods for a F3 and b FP1 channel

In simulated data, an original artifact-free clear EEG signal is available. Therefore, PSD plots of contaminated, original clear EEG signals, and filtered EEG signals of the frontal channel FP1 and F3 of dataset 1 were plotted to show the similarity of filtered signals with the original clear EEG signal. Among all methods, the proposed SNOAR method has shown the highest similarity with the original clear EEG signal, proving that the proposed SNOAR method removes artifact efficiently and does not affect any other cerebral activities.

A high difference can be observed in the delta band PSDs of contaminated EEG and filtered EEG signal. Among all methods, MEMD shows the highest decrement in delta band PSDs. However, these values are even lower than PSD values calculated using the original clear EEG signal, indicating that MEMD removes some of the neural activity parts of contaminated EEG in the delta band. The difference in PSDs for other bands was negligible in contaminated EEG and filtered EEG using the proposed method.

4.2 Simulation results using real dataset

For real EEG data analysis, EEG signals recorded during imagination of phonemes /uw/, /tiy/, /pat/, and /n/ from KARA ONE dataset are used. KARA ONE dataset was recorded using 64-channel Neuroscan headset, and the sampling frequency of this dataset was 1 kHz. For pre-processing, data was down-sampled to 250 Hz, filtering of 0.1–45 Hz was applied, and a common average reference was used. For the KARA ONE dataset, the window length was chosen based on the criteria L > fs/fm, and it was set to 500, fs was set to 250 Hz, and fm was set to 0.5 Hz (because EOG artifacts are present in the 0.5–5-Hz frequency range). The estimated HEOG and VEOG artifacts using SSA + NMF of subject 9 for phoneme /n/ (KARAONE dataset) along with actual artifacts are shown in Figs. 9 and 10. By observing Figs. 9 and 10, it can be concluded that the proposed SNOAR method effectively estimates the VEOG and HEOG components.

Fig. 9
figure 9

a HEOG reference present in /n/ phoneme recording of S9 subject of KARA ONE dataset. b HEOG component calculated using SNOAR

Fig. 10
figure 10

a VEOG reference present in /n/ phoneme recording of S9 subject of KARA ONE dataset. b VEOG component calculated using SNOAR

Figure 11 shows contaminated EEG data and artifact-free signals estimated using the SNOAR, MEMD, and wICA-based method for the /n/ phoneme. The figure clearly shows that the proposed method effectively removes high-voltage EOG activity, while the cerebral activity in the EEG signal was preserved. As a result, it can be concluded that the proposed method is capable of removing ocular artifacts from EEG signals without compromising brain activity. When comparing the performance of the three approaches for removing artifacts, it can be seen from Fig. 11 that the loss of cerebral activities present in the contaminated EEG signals is minimal after artifact removal using the proposed method compared to other selected artifact removal methods.

Fig. 11
figure 11

Artifact-contaminated signal and artifact-removed signal of /n/ phoneme using SNOAR, MEMD, and wICA

Artifact-free EEG data is not available for the real-time dataset. As a result, RMSE values for performance measures cannot be calculated; instead, a non-reference quality measure such as delta rhythm shift in energy ratio was utilized. As most of the ocular activity is present in the delta band, removing the ocular artifact will decrease the delta band energy ratio. Table 3 shows the energy ratio for the delta rhythm of contaminated EEG signals as well as artifact-removed EEG signals for various phonemes of subject 9. After using the proposed SNOAR method, the energy ratio of delta rhythm became significantly lower for EEG recordings of selected phonemes compared to ICA, wICA, contaminated EEG, and improved wICA except for MEMD. The reason for obtaining a lower delta band energy ratio is the significant loss of cerebral activities in the delta band after artifact removal using MEMD, as shown in PSD plots in Fig. 12, which results in a low delta band energy ratio.

Table 3 ERδ for KARA ONE dataset
Fig. 12
figure 12

PSD graph of contaminated EEG and artifact-free EEG obtained after the proposed method and selected methods for a F3 and b FP1 channel

4.3 Discussion

Removing any insignificant activity in EEG signals such as ocular artifact is an essential pre-processing step before analyzing EEG information for applications such as BCI. A combination of SSA-NMF with linear regression was proposed in this work. The performance and effectiveness of the proposed method were evaluated against existing methods such as ICA in terms of RMSE, delta band energy ratio, and PSD graphs using a semi-simulated EEG dataset and a real EEG dataset. The presence of original clear EEG signals in a semi-simulated dataset provides additional advantages while measuring the effectiveness of artifact rejection methods. Real EEG datasets provide details about the proposed method’s performance in practical cases. Therefore, both datasets were considered and used for evaluating the proposed method. In addition, the Klados dataset has only 19 electrodes, and the KARA ONE dataset consists of 64-channel EEG recordings. Therefore, analyzing these datasets guarantees the artifact removal method’s effectiveness in low and higher EEG channel scenarios.

In the Result section, Figs. 7 and 9 show VEOG/HEOG signals recorded using EOG electrodes and calculated using SSA-NMF which expounds the capability of SSA-NMF approach’s in calculating HEOG and VEOG signals.

For the semi-simulated Klados dataset, the proposed method provides the lowest RMSE values compared to other methods. The PSD graphs of Fig. 8 also demonstrate the close similarity of the artifact-removed signal using the proposed method with the original clear EEG signal. Both results confirm the effectiveness of the proposed method in removing artifacts without affecting desired cerebral activity.

Figure 11 shows contaminated EEG signals and artifact-free EEG signals obtained using SNOAR, MEMD, and wICA methods for the real dataset. It can be understood that the proposed method effectively removes artifacts. PSD graphs of Fig. 12 also strengthen this claim as except delta band (where most of the ocular artifacts are present), artifact-free signals using the proposed method are almost similar to contaminated signals. This paper proposes a new method for eliminating ocular artifact from contaminated multi-channel EEG recording. This method can remove ocular artifact without causing higher distortion in desired cerebral activities and works effectively for higher as well as low numbers of electrodes.

For both datasets, four electrodes are required from the brain’s frontal region for SSA-based estimation of ocular artifacts. In the absence of frontal electrodes, the performance of the proposed method is not tested. In addition, if the number of electrodes in EEG data is less than 5, the only first step of the proposed method (SSA-based artifact calculation) is sufficient instead of using the complete method. The proposed method requires more time for artifact removal than traditional regression due to the calculation of VEOG and HEOG channels but removes the requirement of additional EOG channels.

5 Conclusions and future scope

This work proposes a combination of SSA, NMF, and a linear regression approach to remove ocular artifacts from multi-channel EEG recordings. The proposed method does not require additional channels to record VEOG and HEOG components. It estimates VEOG and HEOG components, which can be excellent reference channels/signals during regression-based ocular artifact removal. Researchers are currently using EEG systems with a smaller number of channels since these are cheaper and can be more easily setup and maintained compared to high-density systems. Due to a smaller number of electrodes and the absence of EOG channels, removing ocular artifact has become a challenging task. Therefore, the proposed method was evaluated using datasets with both small and large channels. It can be concluded that the proposed method could effectively eliminate the EOG artifacts in semi-simulated EEG signals and real EEG signals while preserving useful cerebral activities to a reasonable extent. The proposed method outperformed the selected EOG artifact removal methods (ICA, wICA, MEMD, and improved wICA) in terms of various performance measures (RMSE and delta band energy ratio) and differences in PSDs. In the future, the performance of the proposed combination of SSA and NMF for estimation of VEOG and HEOG components and their suppression can be further investigated using variants of SSA and NMF.