1 Introduction

Preprocessing of the EEG signal is an indispensable step for the analysis of EEG in most circumstances. Although there is still a lack of the standard pipeline of EEG preprocessing [8, 37, 58] it generally includes any necessary digital signal processing operations to polish up raw EEG signals with an aim to leave only brain activity signals for subsequent analyses. Often, EEG preprocessing also involves procedures to enhance spatiotemporal characteristics of the EEG signal related to the task used in a study [65].

A number of studies have demonstrated the influences of EEG preprocessing on the subsequent data analysis results [8, 33, 90, 110, 112]. For instance, the classification of different mental states from EEG or the control performance of a brain-computer interface (BCI) could be dependent on how EEG preprocessing treated the recorded EEG signals. In fact, it is obvious that any analytic result from the EEG signals containing significant noise and artifacts is likely to draw misleading conclusions. Recent reports also emphasize the standardization of preprocessing routines for multi-site data collection in divergent experimental environments [8, 37].

At the center of EEG processing lies the removal of any unnecessary covert and overt components of the EEG signals. In this chapter, we denote such unnecessary components as noise and artifacts. Following the previous notion [65], noise is regarded as neurological activities irrelevant to an examined behavioral task whereas artifacts are regarded to originate from external sources unrelated to neurological activities, such as eye movements, respiration or electrical interference. As most EEG preprocessing techniques pay attention to removing artefacts, we will also narrow our focus on the methods used to eliminate artifacts to clean up the EEG signals. Note that the topics covered by this chapter do not include the extraction of features from the EEG signals for particular applications, which should be discussed separately.

This chapter begins with the description of early-stage procedures to remove basic artifacts, sort out contaminated channels and possibly adjust references. It then discusses a range of methods to remove artifacts from the EEG signals, followed by brief discussion on EEG preprocessing.

2 Early-Stage Preprocessing

Early-stage EEG preprocessing involves fundamental and semi-automated organization of signal processing functions. It is distinguished from common artifact removal procedures as this stage of preprocessing is largely independent of any specific artifact. This chapter describes key parts of early-stage preprocessing including the removal of line noise, referencing and the elimination of bad channels. Before describing them, however, it is worth reviewing background characteristics of the EEG signals.

2.1 Characteristics of Background EEG

A basic and brief summary of the characteristics of background EEG activity is given as follows [104]. The frequency range of EEG is reportedly limited approximately from 0.01 to 100 Hz. The amplitudes of EEG generated from the brain typically range within ±100 μV. The power spectral density of EEG is known to follow the power law [44]. Background brain rhythms are present in EEG, generally being classified in terms of oscillatory frequency into five disjoint bands: delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz) and gamma (30–100 Hz). More details of the implications and functions of these rhythms can be found in other resources (e.g. see [10, 41, 63, 98]).

It is reasonable to consider the EEG signal as stochastic due to the lack of genuine EEG measurements [93]. In addition, over a long-term period, the EEG signals should be viewed as a non-stationary time series [57, 66]. However, EEG within a short time window can be approximately stationary with static statistical properties. The length of such a window containing stationary EEG signals varies with environments, generally ranging from several seconds to minutes [51].

2.2 Line Noise Removal

Most efforts to eliminate line noise from the EEG signal rely on notch filtering at 60 Hz. A notch filter is typically implemented with a certain frequency width surrounding 60 Hz (e.g. a width of 10 Hz). Consequently, notch filtering, although successfully removing line noise, could cause unintended distortions in signal components oscillating between 50 and 70 Hz. Also, the notch filter can reportedly generate a transient oscillation in baseline activity, leading to a potential issue in data interpretation [18]. Follow-up low-pass filtering with a cutoff frequency lower than 50 Hz may remedy this problem, but instead give rise to other issues such as alteration of temporal structures of EEG [106] or spurious interactions between EEG channels [40].

One suggestion to overcome this problem is estimating line noise embedded in the recorded EEG signals as precise as possible and subtracting it from the data [8, 80]. This method employs multi-taper decomposition to find line noise components in the signal. A short-time window slides over the course of the signal in which the transformation of EEG time series based on multi-tapers is carried out [5]. This transformation can effectively estimate spectral energy within each frequency band. Then, a regression model is applied to estimate the amplitude and phase of sinusoidal line noise (e.g. sinusoids at 60 Hz) in the transformed frequency domain. The Thompson F-test evaluates a significance of the magnitude of the estimated line noise. A time series of sinusoidal line noise is reconstructed if the magnitude is significant. This process is repeated over the sliding windows. The reconstructed line noise signal is subtracted from the original EEG signal. The entire process is repeated until the magnitude at the frequency of line noise becomes non-significant (Fig. 2.1). In this way, line noise components can be removed without damaging background spectral components [83].

Fig. 2.1
figure 1

Line noise removal using the multitaper transformation

2.3 Referencing

We often subtract a reference (with the same time resolution as the recorded EEG signals) from the original EEG signal at each channel. The reference signal should remain unchanged relative to the EEG signals during the recording such that differences of the EEG signals from reference can effectively represent brain activity related to a study. Typical choices of reference include a signal recorded at a mastoid channel, an EEG signal at a particular channel, the average of two mastoid signals or the average of the entire EEG channels. In any case, it is strongly recommended that a researcher should inspect a chosen reference signal carefully to ensure that its amplitude level is on par with those of other EEG signals and it has no correlation with task-induced brain activity.

Referencing to a mastoid channel has a potential problem because it generates a single point of failure. If the contact to a mastoid becomes poor at any point during the recording, referencing to the mastoid can increase signal variance tremendously, resulting in irreversible contamination of EEG data. The same problem exists for referencing to a particular EEG channel. Using the common average reference (CAR) may reduce the effect of single-point failure [9], but still suffer from an outlier channel. One simple solution to this problem is detecting and removing bad channels before using CAR [8]. There are other systematic re-referencing methods developed to address the issues of reference, based on physical considerations and electrodynamics [38, 113, 114] or on statistical approaches [48, 69, 73].

2.4 Bad Channel Detection

It is often necessary to detect a noisy or bad channel that exhibits a contaminated EEG signal [8]. To detect a bad channel, we can screen each channel to identify EEG signals with excessively large amplitudes. The robust z-score can be used to detect extreme amplitudes. For instance, a bad channel is determined when it shows a robust z-score of the standard deviation greater than a threshold. A bad channel can be also detected by investigating correlation of a single channel with others. Normal EEG recordings show across-channel correlations in the low-frequency components. Hence, the correlation of one channel with other channels after low-pass filtering can allow us to detect bad channels. If two bad channels are incidentally correlated with each other, we can attempt to predict one channel using other channels. The predictor channels can be randomly selected from the remaining channels. Often, a contaminated channel exhibits relatively large energy in high-frequency bands. Thus, we can measure a ratio of the power of high-frequency components to that of low-frequency components and detect a bad channel showing a ratio higher than a threshold.

Once being detected, bad channels are replaced with virtual healthy channels created by the interpolation from neighboring channels, in order to reconstruct the global brain responses [8, 31]. There exist a number of interpolation schemes useful for channel reconstruction, including spherical splines [87], higher-order polynomials [4], nearest-neighbor averaging [15] and radial basis function [53]. Using spherical splines allows accurate estimation of scalp potentials if the electrode mapping is sufficiently dense [38, 97]. Interpolation using a statistical method such as radial basis functions has advantages of cost-effectiveness with less computational loads.

3 Artifact Removal

In this section, we briefly review the potential sources of artifacts mixed in the EEG signal and the techniques to remove or reduce artifacts. We primarily deal with artifact removal techniques, forgoing other steps of artifact management such as artifact detection. However, it does not mean that other methods including artifact detection or artifact avoidance are less crucial than artifact removal. In fact, artifact removal is often accompanied by artifact detection for efficient processing of artifacts. There have been a number of methods for artifact detection that the interested readers can refer to [3, 14, 32, 52, 81, 84].

3.1 Sources of Artifacts

The sources of EEG artifacts can be categorized into two classes: internal and external sources. The internal sources originate from the physiological systems of self and include electromagnetic activities of heart, eyes, muscle and so on. The external sources include all other possible signals from environments that can contaminate EEG such as wireless telecommunication signals, electrode attachment, recording equipment and cable movements [93]. Recently. the handling of external artifacts has become more important as EEG applications tend to move out of laboratories toward in-home healthcare systems [100]. Yet, the external sources, owing to their origins, can be inhibited once being identified. On the other hand, the internal artifacts physiologically permeate EEG, making it difficult to prevent them from occurring in advance. Therefore, most artifact removal methods have been focused on dealing with the internal artifacts and here we also pay our attention to the most pronounced internal artifacts that have been handled by EEG artifact removal methods.

Ocular artifacts include electric activities generated by eye movements or eye blinking [22, 23]. Interference by ocular artifacts is strong enough to be visible in EEG waveforms. EEG channels proximal to eyes are more vulnerable to ocular artifacts. Ocular artifacts can be detected by electrooculogram (EOG) measurements. EOG recorded simultaneously with EEG offers an opportunity to readily remove ocular artifacts from EEG as it helps identify true profiles of artifacts. Once knowing the waveforms of ocular artifacts, removal algorithms can be developed to subtract them from the EEG signal without a need to reject contaminated EEG segments. To measure EOG for ocular artifact removal, it is recommended to record vertical (vEOG), horizontal (hEOG) and radial (rEOG) oculomotor signals [88].

Muscle artifacts include electric activities originating from muscle contraction of the body parts, including face, head, neck, limbs and others. Compared to ocular artifacts, muscle artifacts generate more various forms depending of the sources of muscles and related movements. The electrical signals associated with muscle artifacts can be measured by electromyogram (EMG). However, widespread sources of muscle artifacts over the body make it challenging to identify true profiles of artifacts. In addition, the spectral properties of cranial muscle artifacts vary across sources, corrupting high-frequency EEG components as well as low-frequency ones [93, 105]. The spatial distribution of muscle artifacts is wider than ocular artifacts, almost uniform over the entire scalp [44]. Temporal patterns of muscle artifacts are often associated with tasks as movements of subjects naturally occur in response to task requirements [95]. Considering all these issues, it still remains a significant challenge to remove muscle artifacts from EEG [76, 77, 95].

Cardiac artifacts originate from electric activities of the heart. Cardiac artifacts generally show low amplitudes compared to other artifacts. Cardiac electric activity can be measured by electrocardiography (ECG). They have well-known regular characteristics, which resemble epileptic EEG activity and thus possibly leading to incorrect seizure diagnosis [30]. However, for the perspective of removal algorithms, regular cardiac waveforms make it easier to correct in EEG. When an EEG electrode is positioned over a scalp artery, its contact with the skin can alter periodically due to recurrent motion of a pulsating vessel, which is likely to rhythmic electric activity similar to EEG oscillations [68]. But this pulsation effect shows periodicity synchronous with the heart, rendering itself being identified by ECG.

3.2 Artifact Removal Methods

Artifact removal methods aim to cancel or correct artifacts in EEG with minimal distortions in the brain signal. Here we briefly overview the computational methods to remove artifacts from EEG [52, 104]. Along this path, we avoid describing the details of mathematical backgrounds underlying each method (e.g. blind source separation (BSS), regression, linear transformation of multivariate Gaussian, etc.). Overall, an EEG artifact removal method belongs to one of the two kinds: a group of methods that corrects a single channel independently or another group that processes the whole channels all together. The single-channel processing methods employ various techniques including linear regression, filtering, wavelet transform and empirical mode decomposition (EMD). The whole-channel processing methods are based on BSS to estimate a set of hidden sources from an observed mixture of those sources with only limited information. Below we present several basic methods from both groups that have been most widely used in EEG studies.

3.2.1 Linear Regression

Assuming that artifact reference channels are available and contain thorough waveforms of artifacts, linear regression has been one of the main vehicles used to cancel artifacts from the EEG signal due to its simplicity and ease-of-use. A basic procedure is to estimate a portion of EEG contaminated by artifacts using regression and to subtract the regressed portion from the contaminated EEG [22, 23, 45]. Linear regression assumes that an EEG signal is the sum of an original brain signal and a fraction of the artifact represented in reference. It estimates this fractional factor from both the observed EEG signal and reference channel. The major drawbacks of linear regression are that one or more reference channels must be available (e.g. EOG or ECG), that it assumes a linear combination of EEG and artifacts where the EEG signal may possess internal nonlinear dynamics and non-stationary, and that it only applies well to a few types of artifacts such as EOG and ECG. However, if reference channels are available, linear regression is still an effective solution to remove artifacts [36, 107].

Linear regression methods operate particularly well with ocular artifacts since EOG can be directly measured or indirectly inferred from EEG [13, 42]. However, simple subtraction of a regressed portion of ocular artifacts from EEG can also take out cerebral components. This problem is termed bidirectional contamination [91]. Many methods have been proposed to address bidirectional contamination among which the aligned-artifact average procedure demonstrates promising results of canceling artifacts from eye movements or blinks while minimizing EEG contamination [21,22,23].

3.2.2 Filtering

Filters used for artifact removal build a statistical machine whose parameters are adaptively estimated with certain objectives, learning rules, model structures as well as data. Three types of filters have been primarily adopted for EEG artifact removal [104].

Adaptive filters model the way artifacts contaminate the EEG signal by adjusting the filter weights according to a learning rule formed by an optimization algorithm [47]. They assume no correlation between the EEG signal and artifacts. For example, let x[n] be an observed EEG signal mixed with an unknown clean EEG signal y[n] and an additive artifact signal z[n] (i.e. x[n] = y[n] + z[n]). If the reference to artifact, r[n], is available, the adaptive filter adjusts its weights, w, to minimize error between x[n] and wTr[n]. Since r[n] is assumed to be uncorrelated with y[n], the optimal weights would make wTr[n] as close to z[n] as possible. Then, a difference, {x[n] − wTr[n]} will become close to y[n] (Fig. 2.2). Many learning algorithms are available to adjust weights, including least mean squares (LMS) and recursive least squares (RLS) [47]. It has been shown that adaptive filters are superior to linear regression because proportion factors are less constrained [91]. However, as in linear regression, adaptive filters still require reference channels.

Fig. 2.2
figure 2

EEG denoising with adaptive filtering and reference to artifacts

The Wiener filter is a linear time-invariant (LTI) filter that minimizes the mean squared error between desired response and filter output [47]. Optimal weights of the filter are estimated based on the Wiener-Hopf equation. Learning the weights is done offline with training samples that contain EEG and artifact signals. Having learned its weights, the Wiener filter can operate with the contaminated EEG signals without reference. However, the Wiener filter performance may deteriorate over time if a proportion of EEG contaminated by artifacts changes over time (i.e. non-stationary).

Bayesian filters in a linear or nonlinear form can overcome some shortcomings of both linear regression and the Wiener filter as they can sequentially update the states online without the need of reference channels. Here the states approximate unknown clean EEG signals. The system model in Bayesian filters approximates the sequential transition of clean EEG data according to the first-order Markov process and the observation model estimates the posterior probability distribution of clean EEG data after observing contaminated EEG data using a likelihood model and Bayesian approximation. The parameters of the system and observation models need to be learned from the training data as in the case of the Wiener filter. Although it is computationally expensive to estimate probability distributions in general, with some assumptions, Bayesian filters can reduce to simpler forms such as the Kalman filter or the particle filter. In particular, the Kalman filter has been widely applied for artifact removal for EEG [50, 59, 82].

3.2.3 Wavelet Transform and Empirical Mode Decomposition

EEG denoising can be achieved by decomposing a single-channel EEG signal into a set of fundamental basis signals, with a premise that some basis signals may contain the information of artifacts only. As such, we can find those artifact-related basis signals and remove them from the decomposed set. Two representative methods for decomposition of an EEG signal are presented below.

Wavelet transform convolves a given signal with a scaled and shifted version of a mother wavelet function. It results in a set of coefficients corresponding to each scale and time shift. The coefficients represent a similarity between a segment of the signal and the mother wavelet at a given scale. The discrete wavelet transform (DWT) is derived from continuous wavelet transform with discrete-time sampling. A basic procedure of the DWT is filtering a signal with low- and high-pass filters, respectively, where the low-pass filter works similar to the scaling function and the high-pass filter works similar to the mother wavelet function [52]. Then, the low-pass filtered output is passed to the next level of filtering with low- and high-pass filters again. This procedure is repeated up to K levels and yields one approximation coefficient and K detail coefficients where the approximation coefficient is obtained from the final low-pass filtering and the detail coefficients are obtained from a series of the high-pass filtering through K levels. Then, for denoising, a threshold is applied to the detail coefficients to sort out the ones with small magnitudes. It draws upon a hypothesis that the signal can be strongly correlated with a properly chosen mother wavelet basis at some levels whereas artifacts cannot be [104]. Finally, the artifact-reduced signal is reconstructed by the refined detail coefficients and the approximation coefficient [94]. Systematic ways of selecting a threshold can be found in some studies [34].

Empirical model decomposition (EMD) is a data-driven technique that decomposes a signal into a sum of the band-limited basis functions, called intrinsic mode functions (IMFs) [49]. The IMFs have zero means and are amplitude and frequency modulated. EMD has been shown to perform well with nonlinear and non-stationary signals. If different sets of IMFs can separately represent the signal and artifacts, we can reconstruct a clean EEG signal by removing artifact-related IMFs from the decomposed set. EMD has been successfully applied to artifact removal of EEG [70, 94, 115]. More advanced methods to overcome shortcomings of EMD (e.g. low robustness against noise, no mathematical background), including ensemble EMD (EEMD) [99, 116] and multivariate EMD (MEMD) [108], have also been adopted for artifact removal.

3.2.4 Blind Source Separation

Blind source separation (BSS) has been most widely used for artifact removal when the information about artifacts is limited—for instance, no reference is provided. The basic BSS methods used for artifact removal assume a linear mixture model in which the observed multi-channel EEG signals are assumed to be a linear mixture of unknown sources with little knowledge about sources or a mixing matrix. The optimal estimate of sources and a mixing matrix, thus, is achieved by certain assumptions on the sources such that the sources are mutually independent or uncorrelated. For instance, let x be an observed EEG signal vector, which is a mixture of an unknown source vector s with a mixing matrix A, given by:

$${\mathbf{x}} = A{\mathbf{s}} + {\mathbf{n}}$$
(2.1)

where n denotes additive white noise.

Then, BSS methods estimate A to make sources in s as independent as possible. Once the estimate of A is obtained, its inverse matrix W = A−1 is used to find the sources given by:

$${\mathbf{s}} = W{\mathbf{x}}.$$
(2.2)

These estimated sources are then inspected either empirically (by visual inspection, for example) or automatically (by automatic source selection algorithms [109, 111, 119]) to identify artifact-related sources. The reduced set of sources after removing artifactual ones are then used to reconstruct artifact-free EEG data using A.

Despite its prevalence in EEG preprocessing, BSS suffers from limitations that it requires multi-channel EEG data and that there is always a possibility that removed sources may also carry information about brain activity. In addition, researchers should take into consideration the assumptions each BSS method works under, including independence, uncorrelatedness, and non-Gaussianity [54, 71]. A variety of BSS methods, however, have been successfully applied to remove artifacts from biomedical signals. Below are described several methods that have been widely used for EEG artifact removal.

Independent component analysis (ICA) is a BSS method based on assumptions of mutual linear independence between sources and non-Gaussianity [7]. ICA algorithms are based on either second-order or higher-order statistics [54]. The ICA algorithms based on higher-order statistics estimate W by maximizing statistical independence of the probability density functions of individual sources using mutual information or negentropy [7, 19]. The ICA algorithms based on second-order statistics estimate W by decorrelating the time-series data using the second-order blind identification (SOBI) [20, 103]. ICA has been reported to perform well in EEG artifact removal due to its reasonable assumption of statistical independence between the EEG signals and artifacts (e.g. see [2]). However, to explore statistical independence, ICA needs the sufficient amount of EEG data [56]. Also, ICA works best when the artifacts and the EEG signals remain stationary during the period of analysis, which may not be the case in general. To ensure stationarity, studies have suggested an epoch of 10 s or less, or a sample size in the order of multiples of \(\sqrt C\) where C is the number of channels [56, 92]. When only a limited number of data samples are available, studies have suggested using the ICA algorithms with second-order statistics [28, 55].

Principal component analysis (PCA) has been proposed as a means to remove artifacts from EEG [39, 67, 102]. PCA transforms presumably correlated multi-channel EEG data to mutually uncorrelated principal components (PCs) that preserve variance of the EEG data as much as possible. A set of PCs can represent artifacts if artifacts and brain signals are uncorrelated with each other. PCA also assumes joint normal distributions of the data. Often, it suffers from its restricted assumption that sources including brain activities are orthogonal to each other [39]. Hence, PCA is now seldom used directly for artifact removal but instead used for other essential preprocessing such as whitening [35].

Canonical correlation analysis (CCA) has also been extensively used for artifact removal from EEG [29, 43, 118]. Basically, CCA seeks for canonical variables that maximize correlations between two multivariate datasets. For EEG denoising, CCA finds canonical variables between the original data and its time-shifted version (typically one step behind). In doing so, canonical variables inferred in sequence represent the autocorrelation from the highest to the lowest. By assuming that brain activities are more correlated in time than artifacts, CCA identifies and removes canonical components with lower autocorrelations that may correspond to artifacts. The advantage of CCA over ICA is that it can take temporal correlations of the signals into account and use less computational resources [52].

Besides the three BSS methods described above, there are other BSS methods recently proposed for EEG artifact removal. Morphological component analysis (MCA) can decompose artifacts from EEG if the morphological template of the target artifacts is available [96]. Singular spectrum analysis (SSA) is a projective subspace method that projects a single-channel EEG signal onto a higher-dimensional space by time embedding, decomposes the embedded signal vector into uncorrelated components and reconstructs the EEG signal by projecting the embedded signals in the directions with large eigenvalues [24, 25, 101]. The sparse time artifact removal algorithm identifies and removes artifactual components of EEG that are sparse in both space and time [27].

3.2.5 Hybrid Artifact Removal Methods

Recent studies have proposed hybrid approaches for EEG artifact removal by combining more than one artifact removal algorithms. Many studies blend one algorithm from the BSS family and the other with decomposition (e.g. wavelet transform or EMD). A hybrid method can be characterized by the order of the applications of the selected algorithms. One group of methods first decomposes an EEG signal and then applies a BSS algorithm later whereas a different group of methods first estimates components using a BSS algorithm followed by a decomposition algorithm. The former usually corrects a single-channel EEG signal whereas the latter processes multi-channel EEG signals (Fig. 2.3). The hybrid approaches are generally designed to overcome the limitations of a single artifact removal approach and thus exhibit better performance, but require more careful choices of algorithms that fit adequately to the data and/or system requirements (e.g. computational complexity). The examples of the first group of hybrid methods for artifact removal, decomposition-BSS for single channels, can be found in various forms, applying wavelet transform followed by (f.b.) ICA [11, 74, 75], EMD f.b. ICA [79, 117], and EMD f.b. CCA [16, 99]. The examples of the second group, BSS-decomposition for multiple channels, can also be found in different forms, including ICA f.b. wavelet [1, 12], stationary subspace analysis f.b. EMD [115], ICA f.b. EMD [70], ICA f.b. regression analysis [61], and ICA f.b. adaptive filtering [46].

Fig. 2.3
figure 3

Types of hybrid methods for EEG artifact removal

4 Discussion

This chapter presents an overview of essential preprocessing steps for EEG. More detailed guidelines of practical preprocessing procedures can be found in existing literature (for instance, see [8, 52, 100, 104]). Although there has been substantial progress in the development of EEG preprocessing methods until recently, continuous advances in EEG-based research keep demanding innovations in preprocessing techniques. For instance, pervasive and ambulatory applications using EEG foster the development of preprocessing methods that can work with only a few channels in real time [78, 86]. Recent neuroscience approaches to use multi-modal brain measurements demand new ways of preprocessing EEG along with other signals such as functional magnetic resonance imaging (fMRI) [17]. EEG hyperscanning techniques recording brain activities simultaneously in more than one person, possibly over different sites, need a more systematic preprocessing procedure [6]. Here, we briefly discuss some ongoing issues and suggestions in the studies involving EEG preprocessing.

When comparing the artifact removal performance of different algorithms, often for the demonstration of the superiority of a newly proposed algorithm to existing ones, we can encounter the issue of the lack of ground truth. Since it is generally unknown about the exact waveform of a genuine EEG signal of interest, it is difficult to assess how much a noisy EEG signal become purified by an artifact removal algorithm [52]. One way to address this issue is to synthesize simulated signals mixed with putative true EEG signals and artifacts and evaluate an algorithm with the simulated signals [60, 64, 92]. Others have suggested using a well-known EEG waveform evoked by an established cognitive task to test artifact removal methods [104]. For example, an audio-visual task evoking the auditory N100 event-related potential may provide a validation dataset with which researchers can evaluate different artifact removal methods by assessing N100 waveforms after eliminating artifacts by different methods (see [88] for more details).

Besides performance evaluation discussed above, there are other issues to address for the development of an EEG artifact removal method. First, many recent EEG applications demand online preprocessing of artifacts [26, 43, 86]. Such online preprocessing is capable of detecting and removing artifacts even for non-stationary environments so that it can adaptively update the parameters of algorithms by tracking environmental changes. As such, the requirement of online processing sometimes weakens the advantages of certain algorithms that rely on the estimation of model parameters using a chunk of the training data (e.g. ICA or EMD). Also, computationally expensive machine learning algorithms (e.g. those with deep learning algorithms) may need further justification to be used for online processing. Yet, in the course of the development of a new artifact removal algorithm, it would be more effective to consider online implementation if possible. A fully automated artifact removal algorithm will underpin online implementation [26, 84]. Second, the availability of reference channels should be taken into consideration for artifact removal. If no reference channel is available, we need to use prior knowledge about artifacts or infer artifacts directly from EEG data [62, 72, 86]. Generally, using an explicit reference channel may help customizing algorithms for each individual, yielding a more precise preprocessing method. Depending on the types of artifacts, it may be useful for improving EEG preprocessing to utilize reference channels, often acquired with a separate device, such as: EOG channel [22, 23, 61], ECG channel [30], eye tracker [85], accelerometer [24, 25], and contact impedance [119]. Third, it would be crucial to match the properties of an algorithm with statistical and physiological characteristics of the artifacts to remove. The readers may refer to Urigüen et al. [104] for the suggestions of artifact removal algorithms suitable for different types of artifacts. Fourth, researchers often opt to utilize public software tools for EEG preprocessing as well as other EEG data analyses (see [52] for the list of available software tools). Even though a number of software tools offer complete preprocessing routines and user interfaces for EEG studies, it is recommended to intensively explore the theoretical backgrounds and technical details of a tool being used. Otherwise, it is difficult to understand how EEG signals are processed at each preprocessing step. Fifth, it is helpful to inform study participants about the problems of artifacts in EEG recordings such that participants can minimize their movements during the main tasks [89]. Although it would be also problematic if participants pay too much attention to movement restriction throughout the whole experiment, a short training phase for participants to minimize movements during the task periods interleaved with more flexible breaks can help acquiring high-quality EEG data at the stage of recording. This instruction would be especially crucial for the studies recruiting younger participants. Sixth, not only highly contaminated channels but also highly contaminated trials are often eliminated from the analysis. The elimination of contaminated trials is usually conducted after all the preprocessing steps but its operational principle is similar to other preprocessing methods. Generally, the trials containing the EEG signal magnitude greater than a threshold level (e.g. ±150 µV) are classified as being contaminated [89]. Here, the threshold must be specified depending on experimental conditions. Rejection of too many trials would cause a shortage of the amount of data in the subsequent analyses, so a careful interactive investigation between preprocessing methods and trial rejection should be considered. Finally, a developed preprocessing pipeline may call for assessments based on feedbacks from the designated applications (e.g. classification of the user intention for brain-computer interfaces). Consequently, it is worth deliberating an end-to-end design of EEG signal processing, from the recording to the interpretation of EEG as a whole.