Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Independent Component Analysis (ICA) is a multivariate technique that enables us to linearly transform a given random vector into a vector of (maximally) independent components. In the last decade, ICA has been widely used in biomedical applications: e.g., for the detection of the fetal electrocardiogram [20, 45, 6466], in the analysis and classification of heartbeats [10, 12, 57, 71], in functional magnetic resonance imaging (fMRI) [22, 36, 52], for the development of brain computer interfaces[27, 80], in photoplethysmography [54, 56], in electromyography [14, 44], for the diagnostic of scoliosis [1], in the modeling of metabolic processes [59], et cetera. ICA is also closely related to the blind source separation problem.

This chapter reviews the use of ICA in the study of brain and, specifically, electroencephalogram (EEG), which records the brain’s electrical activity. Our aim is to provide an introduction for those who want to get started in the main points. The chapter is organized as follows: first of all, we provide basic background information on the structure and function of the brain. The application of ICA to EEG data is reviewed in Sect. 16.4, with special emphasis in the interpretation of the independent components, in the use of ICA for denoising the data, in the search for the sources of the electromagnetic fields in the brain, and in the study of the so-called evoked and event-related potentials. We focus in these specific analyses because ICA has demonstrated well its effectiveness for all of them. The ICA of natural images has attracted great attention in recent years, due to its ability to explain certain characteristics of the simple cells in the visual cortex, and is explained in Sect. 16.5. In Sect. 16.6, we present some algorithms specifically devised for the analysis of the EEG and, finally, the last Section is devoted to present some conclusions.

2 Background of Brain Structure and Function

The brain is the part of the central nervous system that gives rise to thought and consciousness, interprets the stimuli from the environment, and controls and coordinates other organs of the body. It is made up of 15–33 billion neurons and more than 100 billion nerves. There are two kinds of tissue in the central nervous system: grey matter and white matter. Grey matter consists of closely packed neural cell bodies, and can be regarded as the information processing part of the central nervous system. Grey matter is found at the cerebral cortex and also at the surfaces of the cerebellum, the brainstem, the basal ganglia and the limbic system (these terms are explained below). White matter is a vast system of neural connections that contains the nerve fibers (axons) that communicate the regions of the brain to each other.

Our brain is composed of three specialized parts that collaborate together: the cerebrum (see Sect. 16.2.1), the cerebellum, and the brain stem (see Fig. 16.1):

  • The brain stem is the link between the spinal cord and the rest of the brain. It performs many basic reflex functions, contributing to the control of the cardiac and respiratory functions and maintaining the consciousness.

  • The cerebellum is at the back of the brain and regulates the muscular activity. It is responsible for accurate movement coordination, motor learning, equilibrium, posture, balance, and muscle tone. The cerebellum does not decide to make the movements, but executes the motor commands from the cerebrum, calibrating the actions and position according to the information received from the muscles and the inner ear.

Fig. 16.1
figure 1

Parts of the brain. a Structure of the brain. b Brain cortex

The brain is bathed in cerebrospinal fluid, surrounded and protected by a layer of tissues called meninges, the blood–brain barrier, and the bones of the skull (cranium).

2.1 The Cerebrum

The cerebrum is the dominant part of the brain and comprises two (more or less symmetric) left and right hemispheres, connected by a large white matter structure called corpus callosum. The cerebrum may itself be divided into three subregions (see Fig. 16.1):

  1. 1.

    The cerebral cortex.

  2. 2.

    The basal ganglia.

  3. 3.

    The limbic system.

The outermost layer of brain cells is called cerebral cortex and is made up of grey matter. Thinking and voluntary movements begin in the cortex. The cortex is only 1.5–4.5 mm deep and, due to its special interest, it will be described in some detail in Sect. 16.2.2. Under the cortex we find a large mass of white matter, within which a number of clusters of neurons (grey matter) called basal ganglia are found. The basal ganglia are involved in perception, attention, motivation and motor functions. Basal ganglia also have an important role in controlling eye movements. Finally, the limbic system (also called the “emotional brain”) consists of several nerve pathways incorporating subcortical structures located on top of the brain stem, including the hippocampus,Footnote 1 the hypothalamus,Footnote 2 the amygdala Footnote 3 and the thalamus.Footnote 4 The limbic system controls our emotions and plays an important role in learning, memory, control of appetite, and in the regulation of hormones.

Interestingly, it has been suggested that the cerebral cortex performs unsupervised learning, the basal ganglia are devices for reinforcement learning, and the cerebellum performs supervised learning.

2.2 The Cerebral Cortex

The cortex is the outermost layer of brain cells, and deserves special attention. It is a very thick layer of neural tissue, composed of a narrow convoluted margin of grey substance.

The cortex is a continuous sheet of grey matter. Note, however, that it is conventionally divided in each hemisphere into four lobes, named after the bones under which they are located (see Fig. 16.1b):

  1. 1.

    The frontal lobe. Under the forehead.

  2. 2.

    The parietal lobe. Under the top of the head, above the ears.

  3. 3.

    The temporal lobe. Above ears and immediately behind and below the frontal lobe.

  4. 4.

    The occipital lobe. At the back of the head.

Different lobes of the cortex have different functions. Basically, these functions can be grouped into three major categories: cognitive (language, thinking, and interpretation of the world), motor (functions related to the control of voluntary movements), and sensory functions (the ability to process the information from our senses):

  • The frontal lobe is associated with higher cognitive functions (personality, reasoning, and judgement) and, in collaboration with the basal ganglia and the parietal lobe, is also responsible for motor functions (e.g., the primary motor cortex is located at the posterior part of the frontal lobe). Broca’s area, whose functions are linked to speech production, is also in the frontal lobe.

  • The parietal lobe integrates the main somatosensory receptive areas, i.e., those related to the sense of touch, and its functions also include spatial orientation or the ability to read and write. Left part of the parietal lobe has also the ability to understand numbers and solve mathematical problems.

  • The part of the cortex responsible for processing sound is mainly at the temporal lobe (the Wernicke’s area, which is usually above the left ear, plays a key role in the comprehension of language). Temporal lobes also control visual and verbal memories.

  • The part of the cortex that processes visual information (i.e., the primary visual cortex) is located at the occipital lobe.

Let us finish with a true curiosity: each cerebral hemisphere controls mainly the opposite side of the body and, interestingly, left part of the cerebrum seems to be responsible for numerical and scientific thinking, and written and spoken language; by contrast, the right part of the cerebrum seems to be linked to artistic capabilities and imagination.

2.3 The Electroencephalogram

In a sense, trying to understand the inner working of the brain through the EEG is comparable to trying to understand the mechanisms of a motor through the motor noise. The EEG mainly arises from the postsynaptic currents in the pyramidal neurons of the cortex. Pyramidal neurons are the most abundant type of neuron in the cortex, and receive their name from the similarity between the cell body (soma) and a pyramid. Every neuron receives inputs from many others. In each communication, the “transmitter” neuron is called presynaptic, and the “receiver” neuron is called postsynaptic (the synapse is the point of connection between the neurons). When two neurons communicate, a flow of positively charged ions, the postsynaptic current, is generated from the presynaptic cell to the postsynaptic cell (that current also produces a voltage, called postsynaptic potential, across the membrane of the postsynaptic neuron). In practice, hundreds, if not thousands, of postsynaptic currents combine in the neuron and, if their sum pass a threshold, an action potential occurs. The action potential is a short spike (1 ms) that propagates through the axon to other neurons, generating new postsynaptic currents. The summation of the electric fields associated with the synchronous postsynaptic currents of millions of neurons can be measured at the scalp, giving the EEG. More precisely, the EEG is a record over time of the differences of potential between different locations on the surface of the head.

Figure 16.2 shows the standard location of the electrodes for EEG recording. As an example, Fig. 16.3 shows typical voltage waveforms as can be measured at these locations: in this figure, note that the EEG is not “clean”, but rather is contaminated by a number of artifacts, e.g., a “bump” artifact appears at \(t = 2\) s in the frontal electrodes most probably due to the fact that the subject has blinked or moved the eyes (see Sect. 16.4.4).

Fig. 16.2
figure 2

Standard placement of electrodes for EEG recording. Letters “F”, “T”, “P” and “O”, respectively, mean frontal, temporal, parietal, and occipital lobe (see Fig. 16.1b). The ‘C’ letter stands for central, and letter “z” (zero) refers to an electrode placed on the center line. Electrodes on the right hemisphere are numbered with even numbers, and odd-numbers are used on the left hemisphere. “Fp” refers to the frontal polar sites

Fig. 16.3
figure 3

EEG data. The figure represents 5 s of 61 raw EEG channels, obtained from a healthy subject . Data was obtained from the Physionet database (http://www.physionet.org/pn4/eegmmidb/). The placement of the electrodes, as well as an explanation of the nomenclature used for the channels, can be seen in Fig. 16.2. The horizontal axis represents time in seconds. The ICA of these data is presented in Fig. 16.4

3 Overview of EEG Signal Processing

EEG signal processing (see [72] for a book of reference) usually comprises three steps:

  1. 1.

    Noise reduction.

  2. 2.

    Feature extraction.

  3. 3.

    Feature classification.

Some comments are in order.

3.1 Noise Reduction

The EEG signal measurements are usually contaminated by several types of noise and artifacts, for example, electrocardiogram artifacts and eye-induced artifacts. Eye blinks, for example, elicit a large potential difference between the cornea and the retina that can be one order of magnitude larger than the EEG (see Fig. 16.3).

The bandwidth of the EEG is from about 1 to 100 Hz, although we rarely go beyond 50 Hz in clinical practice. Most of the noise can be suppressed by applying low-pass filters. DC and baseline drifts can be eliminated using high-pass filters (1 Hz cutoff frequency), and powerline harmonics can be removed with a comb filter. If the subjects under test do not maintain their eyes closed during the recording of the EEG, additional processing is required to eliminate eye-blink artifacts. Adaptive filtering has been used for this task, where the necessary reference signals are taken from electrodes located in the vicinity of the eyes. Adaptive filtering can be also used to eliminate electrocardiogram (ECG) artifacts.

Of course, as the reader well knows, ICA is a valuable tool for denoising and removing artifacts. In fact, denoising and removing artifacts seem to be the primary use for ICA in EEG signal processing. More information will be given in Sect. 16.4.4.

Fig. 16.4
figure 4

Independent components of the data shown in Fig. 16.3. ICA was performed by using the Infomax algorithm [7]. Scalp maps and equivalent current dipoles (ECDs) of these independent components are shown in Fig. 16.5

3.2 Feature Extraction

After removing noise and artifacts, the second step in EEG signal processing usually consists in extracting relevant features out of the EEG signals.

Since the EEG is highly nonstationary in nature, feature extraction can be performed only after prior segmentation of the signals into short segments, usually not longer than a few seconds. Features are then extracted from each one of them. Within each segment, the signals are considered to be stationary and can then be described by suitable probability distributions. The major problem is, of course, to determine the initial and final time instants of each segment. Usually, the data is first divided into short-time frames and statistics, such as the kurtosis, are computed for each frame. Denoting \(s(n)\) the value of the test statistic in the \(n\)th frame, if \(|s(n) - s(n-1)|\) is greater than a predefined threshold, we assume that the “border” that separates two consecutive segments is located in between the \(n\)th frame and the preceding \((n-1)\)th frame.

Fig. 16.5
figure 5

This figure shows the scalp topographies and the current equivalent dipoles (ECDs) of some of the independent components in Fig. 16.4 (the number between parentheses is an indicator of the residual error in the estimation of the ECD). The physiological origin of the independent components may be determined from this information. The “dots” indicate the location of the electrodes

Features can be selected in several ways. There exists time-dependent features (mean and peak values of the EEG signals, energy, higher order statistics, entropy, autoregressive (AR) parameters, Lyapunov exponents, \(\ldots \)), frequency-dependent features (power spectral density (PSD) values, band powers, \(\ldots \)), time–frequency-dependent features (matching pursuit, coefficients of the wavelet decomposition, \(\ldots \)), and so on. Spatial-based features are particularly interesting. The most important spatial-based feature is the localization at a given time of the regions inside the brain in which the postsynaptic currents Footnote 5 are more active. This feature provides valuable information on the functioning of brain and, also, on several diseases and abnormalities. ICA has revealed itself as an useful preprocessing tool for this task (see Sect. 16.4.2).

Notice finally that, to take into account the time-course variation of the EEG’s characteristics, it is usual in EEG signal processing to concatenate the features from several different time segments into a single feature vector.

3.3 Feature Classification

Finally, we try to classify the features into different classes that, in turn, correspond to different brain activities. For example, epileptic seizures produce a series of sharp spikes in the EEG. Their second- and higher order statistics may be classified to determine automatically the type and severity of the epileptic attack or, even, to distinguish between a true epileptic seizure and a nonepileptic attack.

Linear classifiers, such as Fisher’s linear discriminants and support vector machines (SVM), are probably the most popular classification methods in EEG signal processing. Linear classifiers use hyperplanes to separate the data into classes. Fisher’s linear discriminant assumes that the data is gaussian distributed, and (roughly speaking) obtains the separating hyperplanes by maximizing the distance between certain projections of representative members of the classes. As an alternative, SVMs select the hyperplanes by maximizing the distance to the classes. Interestingly, SVMs also enable us to define nonlinear decision boundaries by previously mapping the data to another space of higher dimensionality.

Other classifiers used in EEG signal processing include multilayer perceptrons, Bayes classifiers or Hidden Markov Model (HMM) classifiers. Nearest neighbor classifiers are also popular when unsupervised learning is required. Finally, note that several classifiers can be combined to obtain a better performance using, for example, voting algorithms such as bagging or boosting.

4 The ICA of EEG data

The use of ICA for studying brain dynamics greatly follows from the seminal work [60] by Makeig and co-workers. A good survey of these and other authors’ contributions can be found in [50, 77, 78]. For simplicity, we shall focus mainly on the analysis of the electroencephalogram (EEG), but essentially the same applies to the ICA of magnetoencephalogram (MEG) data. Also note that there exists an excellent and freely available Matlab toolbox, called EEGLAB, that can be used to process EEG data in many ways (www.sccn.ucsd.edu/eeglab/). This software has been used to generate nearly all of the figures in this chapter.

4.1 Interpretation of ICA

In EEG signal processing, unfortunately, ICA raises more questions than we can answer. Let us list some open problems below:

  • What does ICA do? This is at least controversial: since no part of the brain functions completely independent from the others, how can ICA generate physiologically plausible component waveforms [61]? All we can actually expect is that ICA will perform a decomposition of the EEG recordings into temporally independent components. “Temporally independent components” is often interpreted by neurobiologists as signals having “maximally distinct” waveforms. The effective number of independent components contributing to the EEG is a priori unknown, and may vary from one subject to another even under the same conditions.

  • Have the “independent components” got a definite physical origin? Actually, their origin may be distributed across many brain regions and, moreover, is a priori unknown. Each independent component can come from the linear combination of postsynaptic currents spread around all the brain. Having said that, it is very interesting that, in many cases, the independent components seem to be linked to physically compact areas of the brain (see Sect. 16.4.2).

  • What does ICA actually do? Makeig et al. consider that ICA actually reveals a system of synchronous but independent electromagnetic activity within relatively large independent EEG domains [63]. In other words, ICA defines transient brain networks (that may be distributed, linked, and even interpenetrated) whose electromagnetic activity is concurrent and independent, and all together make up the EEG data. This is a different but complementary perspective of the brain to that adopted by traditional neuroscience. Note that ICA is not actually concerned with the spatial location of those brain networks, if this has sense, but with the information they provide. What to do with this information, and how to integrate it with other approaches, is an interesting line of open research.

  • What is ICA currently useful for? In any case, ICA has demonstrated its effectiveness as a preprocessing tool: definitively, ICA is able to remove a wide range of artifacts (see Sect. 16.4.4) and is of great assistance in modeling the electromagnetic fields in the brain (see Sect. 16.4.2). Moreover, the ICA decomposition facilitates the analysis and classification of the so-called evoked and event-related potentials (EPs and ERPs) (see Sect. 16.4.3). Finally, although not directly connected with the study of the EEG, we would like to mention that there are strong similarities between the processing of images in the human visual system and ICA (see Sect. 16.5).

4.1.1 Characteristics of the Independent Components

Having identified the ICA model,

$$\begin{aligned} \mathbf{x} = \mathbf{A} \, \mathbf{s}, \end{aligned}$$

where \(\mathbf{x}\) contains the signals recorded by the electrodes and \(\mathbf{s}\) is the vector of independent components, the columns of the mixing matrix \(\mathbf{A}\) give the relative strength of each component at each electrode. A graphical representation of these strengths, depicted at the location of the corresponding electrodes on a cartoon head model, is called scalp map or scalp topography of the independent component (see Fig. 16.5). It should be noted that as important as the waveform of the independent component is its associated scalp map: the physical origin of the components can be often identified by these maps (e.g., eye activity is located mainly at frontal sites [50]).

Table 16.1 EEG frequency bands

Moving on to other issues, it is well known that the normal EEG waveforms can be classified into six patterns: alpha, beta, delta, gamma, mu, and theta (see Table 16.1). The frequency analysis of the independent components shows that gamma band and near DC dynamics appear to be less well represented than activity in intermediate frequency bands [2]. Recent papers include a study of the reliability of the independent components when ICA is trained on insufficient data, that can be found in [26].

4.2 Identifying the Electromagnetic Brain Sources

We have already mentioned (see Sect. 16.2.3) that the EEG is a record of the electrical activity of the brain that arises from the postsynaptic currents in the pyramidal neurons of the cortex. A postsynaptic current appears to an external observer as if it were generated by a current dipole. When many neurons are active, dipoles with the same orientation sum to form a single large current dipole, which is usually referred to as an “equivalent current dipole” (ECD). Interestingly, areas with a diameter up to 3 cm can be accurately modeled by a single ECD. The potential due to a current dipole of moment \(\mathbf{p}(t)\) at a point specified by a radius vector \(\mathbf{r}\) originated at the position of the dipole is

$$ v(t) = \mathbf{p}(t) \cdot \frac{\mathbf{r}}{4 \, \pi \, \sigma \, |\mathbf{r}|^3} $$

where \(\sigma \) is the permittivity of the medium. Denoting \(\mathbf{e}_i\), \(i = 1, 2, 3\), the orthonormal basis vectors in the three-dimensional space and letting \(\{s_1(t), s_2(t), s_3(t)\}\) be the coordinates of \(\mathbf{p}(t)\) in this basis, i.e., \(\mathbf{p}(t) = \sum \nolimits _{i=1}^3 s_i(t) \mathbf{e}_i\), it follows that

$$ v(t) = \sum _{i=1}^3 a_i \, s_i(t) $$

where \(a_i = \mathbf{e}_i \cdot \mathbf{r} / (4 \, \pi \, \sigma \, |\mathbf{r}|^3)\). The signals recorded at the electrodes \(v_1(t), \ldots , v_N(t)\) are modeled as the superposition of the potentials due to a large number of dipoles:

$$\begin{aligned} v_1(t) = a_{11} \, s_1(t)&+ \cdots + a_{1M} \, s_M(t) + n_1(t) \\&\,\,\,\,\,\, \vdots \\ v_N(t) = a_{N1} \, s_1(t)&+ \cdots + a_{NM} \, s_M(t) + n_M(t) \end{aligned}$$

where \(s_i(t)\), \(i = 1, \ldots , M\) denote the dipoles’ coordinates (\(M >> N\)) and \(n_i(t)\) considers the contribution of noise. Inferring the number, spatial localization, and orientation of the ECDs on the cortical surface helps to identify the areas responsible for those brain activities which are of interest, but it is a very difficult inverse problem (one of the main difficulties arising from the fact that the electrodes actually record a mixture of the contributions of all dipoles).

ICA has not been designed to solve the above-mentioned inverse problem (among other things because we have no guarantee that the \(s_i\) are independent). Nevertheless, since ICA is able to remove a wide range of artifacts (see Sect. 16.4.4), it has proven to be an efficient preprocessing step that makes easier the localization of the ECDs [15, 34, 70, 75]. Most importantly—and here we refer back to the previous sections—many independent components have scalp maps that are perfectly compatible with an origin in a single equivalent current dipole or in a pair of dipoles [21]. It follows that determining the ECDs that generate those scalp maps may be much better conditioned than solving directly the original inverse problem. As an example, Fig. 16.5 shows the scalp topographies and the current equivalent dipoles (ECDs) of some of the independent components shown in Fig. 16.4. Most importantly, we can assume that the independent components originate at the locations of these ECDs. In this way, we can link the independent components to physically compact regions of the brain.

4.3 Evoked and Even-Related Brain Potentials

External stimuli cause the brain to produce electrical potentials known as evoked potentials and even-related potentials (EPs and ERPs in the future). Measurement of EPs/ERPs involves recording the EEG while stimuli (e.g., sound burst or light flashes) is presented. Usually, EPs/ERPs are signals of very low amplitude (\(\upmu \)V) that cannot be discerned by the naked eye from the background EEG activity. For this reason, the stimulus is repeated many times and the segments (or epochs) of EEG preceding and immediately following each stimulus presentation are collected and summed together, causing random noise to be canceled. The difference between EPs and ERPs is conceptual: while EPs directly reflect the basic processing of the stimulus and occur early in time, ERPs involve later and more complex processes in higher brain structures. Furthermore, EPs usually require to average more epochs than ERPs.

Multiple studies of EPs/ERPs have benefited from the use of ICA, and we will review a few for illustration [9, 11, 16, 17, 28, 47, 55, 62, 79, 80]. Makeig et al. [62] decomposed ERPs, which were recorded in response to visual stimuli, into three meaningful independent components with physically plausible scalp maps. The time–frequency characteristics of the independent components were related to those of an ERP called P300.Footnote 6 Jentzsch [47] conducted an experiment in which subjects were instructed to press buttons in response to some property of a visual stimulus, and ICA was applied to auditory grand average ERPs.Footnote 7 The independent component amplitudes appeared to be sensitive to the hand used in the response, and the components themselves turned out to be quite similar to P300 and N1 waves.Footnote 8 Xu et al. [80] also proposed an algorithm for the P300 ERP detection. Basically, ICA was applied to raw EEG data and those independent components more consistent with the P300 wave were first identified, and then projected back to the scalp. By doing so, the signal-to-noise ratio of P300 was increased, and the wave was then easily detected.

Bishop et al. [9] were interested in the process of maturing of the auditory system. They analyzed auditory grand average ERPs elicited by tones in children between 7 and 11 years. For all age groups, two major independent components were found in the data, which mapped on to the projections of single equivalent dipoles located on the temporal lobe. Interestingly, one of the generators was tangentially oriented and showed substantial changes between 7 and 11 years, whereas the other generator was radially oriented and did not show age changes.

Müller et al. [67] studied event-related MEG recordings, where a single patient was subject to combined auditory and vibrotactile stimulation, generated with a loudspeaker that was also coupled to a balloon that was held by the subject with both hands. ICA was able to separate the somatosensory and the auditory brain responses, and the scalp maps of the independent components were in good agreement with the field patterns of conventional ECDs. Furthermore, these ECDs were located precisely in the brain regions expected to be activated by the respective stimuli. The most interesting part of the paper, however, is that in which the authors discuss the effects of overlearning: while averaging the event-related responses is required to remove the background EEG activity and increase the signal-to-noise ratio, the number of data points available for the ICA algorithms decreases to the same extent, so that the independent components are prone to suffer from overlearning or overfitting. Overlearning produces independent components that are zero almost everywhere except for a single spike or “bump” when HOS-based algorithms are used [73], or independent components with sinusoidal spurious components when SOS-based ICA methods are employed.Footnote 9 As a solution, the authors propose to reduce the dimensionality of the data and an additional resampling-based method to evaluate the reliability of the results. Wang et al. [79] used ICA to select the optimal electrode pair, in the sense of enhancing the signal-to-noise ratio, and detect visual EPs.

4.3.1 Analyzing Single-Trial EPs/ERPs

However, averaging EPs/ERPs has several disadvantages. The most important one is that it eliminates the trial-to-trial temporal variability between EPs/ERPs, even though this variability may reflect changes in subject state and reveal information about brain dynamics [61]. When applied to single-trial EPs/ERPs, ICA gives distinctive results that cannot be obtained by conventional approaches: Jung et al. [51], e.g., describe the ICA decomposition of single-trial 31-channel ERP epochsFootnote 10 from 28 normal, 10 autistic, and 12 brain lesion subjects, all of whom were asked to participate in visual attention tasks and to press a button each time they saw a circle appear on the screen. ICA separated out:

  1. 1.

    Blink-related artifacts and eye movement components.

  2. 2.

    Independent components whose activation was time-locked to the visual stimuli. When projected back to the scalpFootnote 11 and then summed to estimate their contributions to the average response, they accounted for nearly all of the P1 and N1 peaks.Footnote 12

  3. 3.

    Independent components clearly time-locked to the button press. After being realigned to the median response time and projected back to the scalp, the sum of these independent components was closely related to P300 ERPs.

  4. 4.

    Independent components whose behavior is similar to that of \(\mu \) brain waves (see Table 16.1). These independent components decrease following the button press.

  5. 5.

    Spatially overlapping independent components accounting for \(\alpha \) band activity (see again Table 16.1), and that show a variety of relationships to the stimuli and the subject responses.

  6. 6.

    Nonevent-related background EEG activity.

In conclusion, ICA enhances the amount and quality of the information that can be extracted from ERP data. The authors report that ICA facilitates the analysis and classification (successful clustering experiments are reported) of the different types of response, allowing the study of the interactions between the ERPs and the ongoing EEG activity, as well as a better understanding of the brain dynamics.

4.4 Denoising

It should not be surprising that ICA is primarily used as a blind source separation technique for the removal of artifacts such as those caused by blinking, eye muscle movement (electrooculogram or EOG), facial muscle movements, cardiac activity, etc [6, 18, 19, 23, 30, 35, 43, 46, 48, 49, 53, 70, 74, 76]. The idea is simply to reconstruct the EEG data as follows:

$$ \mathbf{x}_d = \mathbf{A} \, \mathbf{s}_0 $$

where \(\mathbf{x}_d\) is the denoised EEG vector and \(\mathbf{s}_0\) is the vector of independent components, in which we have set the artifactual components to zero.

Let us present a simple example. Figure 16.4 shows real EEG data (data were collected for 1 min though only 5 s are shown for clarity). The EEG is contaminated by several artifacts. Specifically, there is an strong eye activity in the frontal electrodes (FP1 and so on): for example, an ocular artifact is clearly visible at \(t = 2\) s—observe, for example, that the short duration of the deflections is compatible with blinking. There is another interfering signal, more visible at the occipital and parietal electrodes (O1 and so on), that is (more or less) periodic with a period slightly lower than 1 s. It is a “peaky” signal that seems to be an electrocardiogram (ECG) artifact.

Figure 16.6 shows the distribution of the voltage at the head surface at \(t = 2\) s and, for comparison, at \(t = 3\) s (when there are no visible artifacts). The plots confirm that the voltage concentrates over the frontal scalp when an ocular artifact is present. First of all, we rejected the independent components whose scalp maps are similar to Fig. 16.6 (such as, e.g., the independent component 1, see Fig. 16.5). These components are assumed to be responsible for the ocular artifacts. By so doing, we obtained the denoised EEG data shown in Fig. 16.7. Figure 16.8 plots the power spectra of the independent components, showing a large peak around 60 Hz. This is not a typical EEG frequency, and we consider it to be the “signature” of an artifact (probably, it corresponds to the aforementioned ECG artifact or, perhaps, to noise line). The figure also shows that the components 1, 2, 4, 6, and 9 are the components which contribute the most at 60 Hz. After rejecting them, we finally obtain the “cleaned” EEG data depicted in Fig. 16.9.

Fig. 16.6
figure 6

Voltage distribution at the head surface. a Voltage at \(t = 2\) s. b Voltage at \(t = 3\) s

Fig. 16.7
figure 7

EEG data after rejecting the independent components associated with ocular artifacts

Fig. 16.8
figure 8

Power spectra of the independent components and distribution of the voltages over the surface of the head at 60 Hz. The figure also shows that the independent components 1, 2, 4, 6, and 9 contribute the most at 60 Hz

Fig. 16.9
figure 9

EEG data after removing the independent components associated with ocular artifacts and those that contribute the most at 60 Hz

In the previous example, we identified the artifactual components by visualinspection. The automatic identification of the artifacts seems to be a more powerful approach, and we will briefly review here three representative ideas:

Escudero et al. [23] obtained satisfactory results in denoising MEG data from 11 healthy elderly subjects. They propose a few criteria for the identification of the artifactual components. Cardiac signals, for example, have highly asymmetric density functions and also tend to be leptokurtic (supergaussian), so that they can be discriminated by their skewness and kurtosis coefficients (which are expected to take large values). On the other hand, power line noise and ocular artifacts can be easily detected by examining their frequency characteristics and scalp maps.

Shao et al. [74] also extract several features from the independent components and use a support vector machine (SVM) to classify them as inherent brain activities or artifacts. For each independent component \(s_i\), six extracted features are defined as follows:

  1. 1.

    The ratio between the maximum peak amplitude and the variance of the independent component: \(f_1 = \max (|s_i|)/\sigma _{s_i}^2\) (ocular artifacts, e.g., have a large amplitude).

  2. 2.

    The normalized skewness: \(f_2 = |E[s_i^3]|/\sigma _{s_i}^3\) (as explained above, the distribution of cardiac artifacts is highly asymmetric).

  3. 3.

    The variance of the scalp map of \(s_i\): \(f_3 = \text {var}(\mathbf{a}_i / \Vert \mathbf{a}_i \Vert )\), where \(\mathbf{a}_i\) is the \(i\)th column of the mixing matrix (it seems that the scalp map of the cardiac artifacts has a low variance).

  4. 4.

    A measure (i.e., the Kullback-Leibler divergence) of the difference between the probability density function of the independent component and that of a representative EOG artifact.

  5. 5.

    The Kullback-Leibler divergence of the probability of the independent component from that of a reference cardiac artifact.

  6. 6.

    The cross correlation between the independent component and a set of eye-blinking dominated EEG channels (namely, Fp1, Fp2, F3, F4, O1, and O2, see Fig. 16.2).

Along the same lines, Dammers et al. [18] propose another criteria for the automated classification of the independent components as either valid data or noise. For example, the detection of cardiac artifacts is performed in [18] as follows: after a bandpass filtering of the independent component under test (using different frequency bands that cover the spectrum of the ECG, namely, 2–4, 4–8, 8–16, and 10–20 Hz), its normalized phase is calculated by the formula

$$ \Phi (t) = \psi (t) / (2 \pi ) \text { mod } 1 $$

where \(\psi (t)\) is the instantaneous phase of the independent component, obtained by the Hilbert transform. The normalized phase is then divided into segments of 1 s around the R-peaks of the ECG signal. Cardiac artifacts are synchronous with the ECG, and hence different segments are expected to have nearly identical normalized phases. In other words: all segments have the same values at the same time or, in other words, samples at the same time point are identical. The distribution of the samples is then degenerate, i.e., a Dirac delta. On the contrary, when the independent component is not a cardiac artifact, according to the principle of maximum entropy, we can assume that the samples are uniformly distributed (the uniform distribution is the maximum entropy distribution among all distributions supported in the interval \([0, 2\pi ]\)). A statistical test is then used to quantify the deviation of the distribution from the uniform distribution. The authors claim that the proposed criterion is highly sensitive for identification of weak components caused by cardiac activity.

5 ICA of Natural Images

Hubel and Wiesel received the Nobel Prize after showing that certain neurons of the primary visual cortex (the so-called simple cells) give their maximum response in the presence of visual stimuli consisting of localized and oriented structures [37, 38], i.e., the neurons respond only if a line in a particular direction (an “edge”) enters their receptive fields.Footnote 13 As one moves through the visual cortex in the occipital lobe, one finds columns of neurons that have approximately the same receptive field location, but with different orientation selectivities. Its an important problem for neuroscience to understand the reasons for this organization in the visual sensory system (why are cells directionally dependent?).

Natural images are highly redundant (i.e., nearby pixels are strongly correlated). Barlow suggested that all sensory systems, including the visual one, aim to remove the redundancy in the input data, trying to minimize the amount of information to be processed, and hypothesized that the activation of each neuron in the sensory system should be as statistically independent from the others as possible [35]. Furthermore, Field [24, 25] argued that the responses of the neurons of the primary visual cortex should be sparsely distributed.

How do we perform ICA in image processing? The observed data vectors \(\mathbf{x}_i\) are obtained after the vectorization of a large number of \(M \times N\) pixel patches selected randomly from the images.Footnote 14 The ICA decomposition of the data can be written as:

$$\begin{aligned} \mathbf{x}_i&= \mathbf{A} \, \mathbf{s}_i \\&= \sum \limits _{k} \mathbf{a}_k \, s_{ik} \end{aligned}$$

where \(\mathbf{a}_k\) denotes the \(k\)th column of the mixing matrix \(\mathbf{A}\), and \(s_{ik}\) is the \(i\)th sample of the \(k\)th independent component. Vectors \(\mathbf{a}_k\) are often called basis vectors, since they provide a generative model of the data. These basis vectors can be also plotted as \(M \times N\) images by an inverse-vectorization operation. When we do that, we get an interesting surprise—and here we refer back to the previous paragraphs: the images of the basis vectors resemble “edges” with different orientations, lengths, and widths (Fig. 16.10). Furthermore, the distribution of the independent components is sparse, as expected, in the sense that most of the values are close to zero and only a few of them are significantly large. In other words, and very roughly speaking, each patch of the image seems to be formed with only a few simple lines.Footnote 15 Confirming what Barlow and Field had predicted, only a few neurons are therefore activated at a time.

These results are not sensitive to the choice of algorithm used. They were first described by Bell and Sejnowski [8], which employed Infomax [7]. Similar results have been obtained using FastICA [39]. Well before the emergence of ICA, Hancock et al. [31] proposed a redundancy reduction approach based on Principal Component Analysis (PCA) only. However, they failed in modeling the receptive fields of the simple cells: according to their results, only a few basis vectors matched oriented and localized patterns. Olshausen and Field [68, 69] proposed an unsupervised learning algorithm that attempted to find a factorial code of independent visual features, generating a set of bases that presented similar properties to the receptive fields of simple cells, i.e., most of them also showed localized and oriented “edges”.

Recent works include [41, 42], where it is proposed a model of spatial organization of the ICA bases that attempts to imitate the retinotopic organization [29] of the visual cortex, and the papers [13, 40], where the authors analyze the similarities between the processing of color images in the human visual system processing and ICA.

Fig. 16.10
figure 10

Typical ICA image-basis obtained from \(12 \times 12\) patches

6 Semi-Blind ICA of Brain Data

Most researchers use traditional ICA blind algorithms for the analysis of brain signals. Nevertheless, we wish to draw attention to three representative approaches [19, 33, 46] that exploit the available a priori knowledge about the data. As a matter of fact, there exists in many cases a priori information about the artifacts that contaminate the data: power line interferences, for example, are at 50/60 Hz and its harmonics, cardiac artifacts are synchronized with heart activity, eye activity is located mainly at frontal sites, etc. The use of this information seems to be a promising possibility.

6.1 Exploiting the Temporal Structure of the Brain Signals

De Clercq et al. [19] use canonical correlation analysis (CCA) for muscle artifact removal in EEG, as follows: given the zero-mean observation vector \(\mathbf{x}(t)\), the idea is to force the source estimates to be maximally correlated with \(\mathbf{x}_1(t) = \mathbf{x}(t -1)\). Thus they pretend to enforce the generation of maximally autocorrelated sources, since it is known that brain sources have a high autocorrelation whereas muscle activity is similar to white noise, due to its broader frequency spectrum. The idea is to search for the vectors \(\mathbf{w}\) and \(\mathbf{w}_1\) that maximize the objective function:

$$ \rho (x(t), x_1(t)) = \frac{E[x(t) \, x_1(t)]}{\sqrt{E[x^2(t)]\, E[x_1^2(t)]}} $$

where \(x(t) = \mathbf{w}^T \, \mathbf{x}(t)\) and \(x_1(t) = \mathbf{w}_1^T \, \mathbf{x}_1(t)\). After some algebra, it is found that \(\mathbf{w}\) is an eigenvector of the matrix:

$$ \mathbf{C}_{xx}^{-1} \, \mathbf{C}_{xx_1} \, \mathbf{C}_{x_1x_1}^{-1}\mathbf{C}_{xx_1}, $$

where \(\mathbf{C}_{xx}\) and \(\mathbf{C}_{x_1x_1}\) are the auto-covariance matrices of \(\mathbf{x}(t)\) and \(\mathbf{x}_1(t)\), respectively, and \(\mathbf{C}_{xx_1}\) is the cross-covariance matrix of \(\mathbf{x}(t)\) and \(\mathbf{x}_1(t)\). The source estimates are then simply given by

$$ \mathbf{w}^T \, \mathbf{x}(t). $$

Each eigenvector of the matrix gives a different source estimate, and the eigenvectors corresponding to the lowest eigenvalues are expected to generate the muscle artifacts. Experiments show that the algorithm is superior to traditional approaches and other ICA techniques based on higher order statistics.

6.2 Using a Temporal Reference

James et al. [46] used a reference signal \(r(t)\) which incorporates the a priori information to guide the search for the independent components. Given the observation vector \(\mathbf{x}\), the following criterion is used in [46]:

$$\begin{aligned} \text { maximize }&f(\mathbf{w}) \\ \text { subject to }&g(\mathbf{w}) \le 0 \\ \text { and }&E[y^2] = 1 \\ \text { and }&E[r^2] = 1 \end{aligned}$$

where \(f(\mathbf{w})\) is the following approximation to the negentropy of the estimated independent component \(y = \mathbf{w}^T \, \mathbf{x}\) [39]:

$$ f(\mathbf{w}) = \left\{ E[G(y)] - E[G(v)] \right\} ^2 $$

where \(v\) is a zero-mean unit-variance Gaussian random variable, \(G(\cdot )\) can be any nonquadratic function, and

$$ g(\mathbf{w}) = \varepsilon - E[r(t)\,y(t)] $$

measures the similarity between \(r(t)\) and \(y(t)\), with \(\epsilon \) being a threshold.Footnote 16 This is a constrained optimization problem that can be solved through a Newton-like algorithm [58]. Interestingly, experiments show that the exact waveform of the reference signals is not very important, provided that the temporal features of interest are captured. For example, a good reference signal for the ECG artifact can be simply obtained by passing the contaminated data through a peak detector that highlights the R waves. As \(g(\mathbf{w})\) is a correlation-based measure, the reference signal \(r(t)\) and the independent component must be aligned in time. The authors address this problem by repeatedly applying the method with the reference shifted one sample from one experiment to the next, until the correlation between \(r(t)\) and the estimated source signal \(y(t)\) attains its maximum value.

6.3 Using Spatial Constraints

Hesse et al. [33] noted that the scalp maps of some expected source signals may be approximately calculated a priori from previous data or using, for example, dipole models. This information may be used as a constraint on the mixing matrix \(\mathbf{A}\), assuming that

$$ \mathbf{A} = [\mathbf{A}_c, \mathbf{A}_u] $$

where \(\mathbf{A}_c\) are columns subject to those constraints, and \(\mathbf{A}_u\) contains unconstrained columns. Roughly speaking, the algorithm may be as follows:

  1. 1.

    Execute one step of some iterative ICA algorithm to find an estimate \({\hat{\mathbf{A}}}\) of the mixing matrix \(\mathbf{A}\).

  2. 2.

    Enforce the constraints on the estimate \({\hat{\mathbf{A}}}\) of \(\mathbf{A}\), ensuring that \({\hat{\mathbf{A}}}\) is of full column rank.

  3. 3.

    Return to 1 until convergence.

The second step can be performed in several ways: for example, the columns of \(\mathbf{A}_c\) may directly overwrite the corresponding columns of \({\hat{\mathbf{A}}}\). Given a column \(\mathbf{a}_c\) of \(\mathbf{A}_c\) and the corresponding column \({\hat{\mathbf{a}}}_c\) of \({\hat{\mathbf{A}}}\), a “softer” and alternative procedure may be to overwrite \({\hat{\mathbf{a}}}_c\) with

$$ p \, \mathbf{a}_c + (1 - p) \, {\hat{\mathbf{a}}}_c $$

whereas \(p\) is chosen so that angle between \(\mathbf{a}_c\) and the new \({\hat{\mathbf{a}}}_c\) is below some threshold [32]. Note that the final constrained source signals may not be statistically independent among themselves. Having said that, when applied to EEG recorded during an epileptic seizure (called ictal EEG), the algorithm obtains a coherent and physiologycally plausible decomposition of the data. The authors also report good results in removing ocular artifacts.

7 Concluding Remarks

ICA has undoubtedly proven to be a useful tool for removing artifacts from the EEG data. The interpretation of the true “brain components”, however, is still controversial and seems to be an exciting open field for research. The ICA of natural images has also revealed interesting connections with the early models of the visual cortex and the characterization of the so-called simple cells. Finally, the use of a priori information about the brain sources to help the ICA algorithms is a third promising line of research.

This chapter has introduced the use of ICA in the study of electroencephalographic (EEG) data. We hope to achieved our goal of writing a general and accessible introduction to the problem for those who want to get started in the main topics. We refer the reader to the references for a second and more profound insight into this exciting subject.