Keywords

1 Introduction

Electroencephalography (EEG) is a non-invasive brain imaging technique that records the electric field on the scalp generated by the synchronous activation of neuronal populations. It has been previously estimated that if as few as one in a thousand synapses become activated simultaneously in a region of about 40 \(\mathrm{{mm^{2}}}\) of cortex, the generated signal can be detected and recorded by EEG electrodes [1, 2]. Compared to other functional neuroimaging techniques such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET), EEG is a direct measurement of real-time electrical neural activities, so EEG can provide the answer on exactly when different brain regions are involved during different processing [3]. PET and fMRI, by contrast, measure brain activity indirectly through associated metabolic or cerebrovascular changes which are slow and time-delayed [2,3,4].

Although EEG has the advantages of easy portability, low cost and high temporal resolution, it can only measure the electrical potential from on the scalp instead of measuring directly the neural activation on the cortex. Reconstructing the activated brain sources from the recorded EEG data is called inverse problem (also known as source localization). Due to its significance in clinical applications and scientific investigations on understanding how our brain is functioning under different cognitive tasks, numerous methods have been proposed to solve the inverse problem [5,6,7,8]. Furthermore, source localization can serve as a preliminary step for brain connectivity network analysis [9, 10], as the calculation of connectivity between two brain regions is based on the measurement of how “closely” related of the time series from those two regions, using Pearson correlation, transfer entropy etc. as summarized in [11]. The next step is to analyze the brain network using complex networks properties such as small-worldness or clustering coefficient [12,13,14,15,16,17], as we saw a trend that more attention has been focused on in neuroscience community from traditional “segregation” perspective to “integration” perspective where the functional and effective connectivity are intensively studied [18] in the past decades.

It is very challenging to solve the EEG inverse problem since it is highly ill-posed due to the fact that the number of dipoles is much larger than that of electrodes. To seek a unique solution, regularization technique could be applied to incorporate prior knowledge of sources. The most traditionally used priors are based on minimum energy, resulting in the \(\ell _2\) norm based minimum norm estimate (MNE) inverse solver [19]. By replacing \(\ell _2\) by \(\ell _1\), minimum current estimate (MCE) [20] can overcome the disadvantage of over-estimation of active area sizes incurred by the \(\ell _2\) norm. Bio-inspired algorithms such as genetic algorithm [21], Particle Swarm Optimization (PSO) is also used in EEG source estimation [22, 23]. Pascual-Marqui et al. presented standardized low-resolution brain electromagnetic tomography (sLORETA) [7] that enforces spatial smoothness of the neighboring sources and normalizes the solution with respect to the estimated noise level; Gramfort et al. proposed the Mixed Norm Estimates (MxNE) which imposes sparsity over space and smoothness over time using the \(\ell _{1,2}\)-norm regularization [24]. In [25], the same authors proposed time-frequency mixed-norm estimate (TF-MxNE) which makes use of structured sparse priors in time-frequency domain for better estimation of the non-stationary and transient source signal. Li et al. presented the graph Fractional-Order Total Variation (gFOTV) [26], which enhances the spatial smoothness of the source signal by utilizing the sparsity of the fractional order TV defined on a triangular mesh. Liu et al. first proposed to combine the EEG inverse problem and the classification problem into a joint framework which is solved using a sparse dictionary learning technique [27]. In recent years, it has been found that by imposing sparseness on the original domain is inappropriate for the extended source estimation, source extents can be obtained by enforcing sparseness in transformed domains with total variation regularization [6, 8, 26, 28], which makes more sense considering the true sources are not distributed in an isolated and independent way.

It is worth noting that the aforementioned algorithms for solving the EEG inverse problem used different prior assumptions of sources configurations. To the best of our knowledge, the traditional algorithms solved the inverse problem in an unsupervised way and did not use the brain status label information such as happiness, sadness or calm. Due to the fact that brain spontaneous sources contribute to the majority of EEG data, label information plays a key role to find the discriminative task-related sources [27, 29, 30]. The remaining question is why not utilizing label information to better reconstruct discriminative source and how to use those information and fuse the label information into traditional inverse models.

In this research, we proposed a graph regularized EEG inverse model that uses label information to estimate source extents with sparsity constraint in the transformed domain. The graph regularization was derived from the assumption that requires intra-class consistency for the solutions at different time points. The proposed graph regularized model was solved by the alternating direction method of multipliers (ADMM). The proposed model is tested to find discriminative source extents and its effectiveness is illustrated by numerical experiments.

2 Inverse Problem

EEG data are mostly generated by pyramidal cells in the gray matter with an orientation perpendicular to the cortex. It is well established to assume the orientation of cortex sources is perpendicular to the surface [31]. With a so-called lead field matrix L that describes the superposition of linear mappings from the cortex sources to the EEG recording sensors, the forward model can be described as

$$\begin{aligned} X= LS+\varepsilon , \end{aligned}$$
(1)

where \(X \in {\mathbb {R}^{{N} \times {T}}} \) is the EEG data measured at N electrodes on the scalp for T time samples and \(\varepsilon \) represents the noise , each column of lead field matrix L \((L \in {\mathbb {R}^{{N} \times {D}}})\) represents the electrical potential mapping pattern of a cortex dipole to the EEG electrodes. D is the number of dipoles. Figure 1 illustrates the process of building a realistic brain model, the cortex sources or dipoles are represented with triangle meshes, each triangle represents a dipole. Here \(S \in {\mathbb {R}^{{D} \times {T}}}\) represents the electrical potentials in D source locations for all the T time points that are transmitted to surface of the scalp. As L is a wide matrix with much more columns than rows, the inverse problem of inferring S from X becomes ill-posed. Mathematically speaking, to seek a unique solution, a regularization technique could be applied. An estimate of S can be calculated by minimizing the following objective function:

$$\begin{aligned} \mathop {\text {argmin}}\limits _S\,\left\| {X - LS} \right\| _F^2 + \lambda R (S), \end{aligned}$$
(2)

the first term is called fidelity term measuring the fitting error and the second term is called regularization term. Here \(\left\| {\cdot }\right\| _F\) is the Frobenius norm which sums up the squared values of all entries in a matrix. The regularization term R(S) is to encourage spatially or temporally smooth source configurations and to enforce neurophysiologically plausible solutions. For example, to restrict the total number of total activated sources, \(\ell _0\) norm can be ideally used as it measures the cardinality of sources. However, the \(\ell _0\)-norm regularized problem is NP-hard from the perspective of computational complexity. Instead of \(\ell _0\)-norm, many convex or quasi-convex norms have been used as \(R(\cdot )\), such as \(\ell _1\). More precisely, source localization can be obtained by solving the following \(\ell _1\)-regularized model.

$$\begin{aligned} \mathop {\text {argmin}}_{s_i}\,\left\| {{x_i} - L{s_i}} \right\| _2^2 + \gamma {\left\| {{s_i}} \right\| _1}, \end{aligned}$$
(3)

where \(s_i\) is the sparse coding for all the dipoles, i.e., the i-th column of S, and the nonzero entry of \(s_i\) represents the activated source. We want to find the best \(s_i\) to fit the observed data \(x_i\) while maintaining sparse configurations.

Fig. 1.
figure 1

From MRI images to realistic brain model. After gathering the MRI scans of the head, tissue segmentation is conducted followed by mesh generation. After assigning conductivity values to different tissues and electrodes co-registered with the meshing model, boundary element method (BEM) was used to solve the forward model. Each triangle represents a brain source, the direction of the source is assumed to be perpendicular to the triangular surface.

3 Graph Regularized EEG Source Imaging in Transformed Domain

In this section, we first introduce the sparsity related regularization in the transformed domain which promotes the discovery of extended source patches rather than isolated sources on the cortex. Our proposed graph regularized model in the transformed domain is presented in detail.

3.1 EEG Source Imaging in Transformed Domain

As the cortex sources are discretized with meshing triangles, simply distributed source imaging using the \(\ell _1\) norm on S will cause the calculated sources distributed in an isolated way instead of grouped as source extents. In order to encourage source extents estimation, Ding [6] proposed to used sparse constraint in the transformed domain and the model was termed as Variation-Based Sparse Cortical Current Density (VB-SCCD). In [8], Zhu et al. proposed to use multiple priors including variation-based and wavelet-based constraints. The VB-SCCD model can be extended by adding sparse regularization in the original source domain named Sparse VB-SCCD (SVB-SCCD) [32]. The common yet most important term is the TV term, which considers the total sparsity of the gradients of the source distribution so that the spatial smoothness can be guaranteed. The total variation was defined to be the \(\ell _1\) norm of the transformed domain using a linear transform characterized with the matrix \(V\in \mathbb {R}^{P\times N}\) whose entries are

$$\begin{aligned} \left\{ {\begin{array}{*{20}{rl}} {v_{ij}} = 1;{v_{ik}} = - 1; &{} {\text {if voxels}}\, j,\! k\, {\text {share edge}}\, i ; \\ {v_{ij}} = 0; &{} {\text { otherwise. }} \\ \end{array}}\right. \end{aligned}$$
(4)

where and N is the number of voxels, P is the number of edges from the triangular grid. The motivation of the total variation regularization is illustrated in Fig. 2, when the red triangle is estimated to be activated, the neighboring blue voxels should also be activated.

Fig. 2.
figure 2

Illustration of V matrix design purpose. When one voxel (in red) is activated, the neighbor voxels (in blue) are encouraged to be activated to achieve smaller goal value in Eq. 5 (Color figure online)

We can see that the matrix VS contains all differences of amplitudes of any two adjacent voxels. Then the model for the VB-SCCD has the following form:

$$\begin{aligned} \left. {\left\langle {{S}} \right. } \right\rangle = \mathop {\text {argmin}}\limits _S\,\left\| {X - LS} \right\| _F^2 + \lambda {\left\| {VS} \right\| _{1,1}}. \end{aligned}$$
(5)

3.2 Discriminative Source Reconstruction with Graph Regularization

Previous studies [18, 33] indicated that the brain spontaneous sources contribute most part of the EEG signal. The neurons in our brain are still active even the subjects are in closed-eye resting state. As a result, the source solution given by traditional EEG inverse algorithm is likely to be corrupted by background noise. A simple example is that suppose \(x_1 = L(s_0+ s_1)\) and \(x_2 = L(s_0+ s_2)\), and \(s_0\) is the spontaneous common source across different classes: brain status 1 and status 2. Here \(s_1\) is the discriminative source for class 1, and \(s_2\) is the discriminative source for class 2. Without the label information, traditional methods such as MNE, MCE, MxNE are trying to estimate the overall source activation pattern instead of the task related ones. Even worse, when the magnitude of \(s_0\) is much greater than \(s_1\) and \(s_2\), the traditional method will be more likely to fail.

Inspired by graph regularization in computer vision community for discovering discriminators of images [34,35,36,37], the proposed model employs a Laplacian graph regularization term in the original VB-SCCD and is termed as GVB-SCCD in this paper. Although we can consider the graph regularized SVB-SCCD model which involves sparsity regularization imposed on both the original source signal domain and TV transformed domain [8], the parameter tuning process will be very complicated since it involves three parameters that balance the trade off between data fidelity, sparsity on original domain, sparsity on the transformed domain, and graph regularization, and its effectiveness will be evaluated in our future work. The graph regularization tries to eradicate the spurious sources that are not consistent intra-class. The common source inter-class are decomposed as the first step using Voting Orthogonal Matching Pursuit (VOMP) algorithm proposed in [29]. The GVB-SCCD model is given below:

$$\begin{aligned} \left. {\left\langle {S} \right. } \right\rangle = \mathop {\text {argmin}}\limits _S \frac{1 }{2} \left\| {X - LS} \right\| _F^2 + \frac{\beta }{2}\sum \limits _{i,j = 1}^N {\left\| {{s_i} - {s_j}} \right\| _2^2{M_{ij}}} + \lambda {\left\| {VS} \right\| _{1,1}}. \end{aligned}$$
(6)

Here \({\left\| \cdot \right\| _{1,1}}\) is the \(\ell _1\) norm notation for a matrix, equal to the sum of absolute values of all entries from a matrix. \(X \in {\mathbb {R}^{{N} \times {T}}} \) the EEG data, where T is the number of samples from different classes. The second term that penalizes the inconsistent source solutions within the same class is the graph regularization term. The definition of M matrix is defined as:

$$\begin{aligned} {M_{ij}} = \left\{ {\begin{array}{*{20}{cl}} + 1, \qquad &{}{\text {if (}}{s_i}{\text {,}}{{\text {s}}_j}{\text {) belong to the same class;}} \\ 0, \qquad &{} {\text {if (}}{s_i}{\text {,}}{{\text {s}}_j}{\text {) otherwise.}} \end{array}} \right. \end{aligned}$$
(7)

The goal of this formulation is to find discriminative sources by decomposing the common source while maintaining the consistency of intra-class reconstructed sources. By defining D as a diagonal matrix whose entries are row sums of the symmetric matrix M, \(D_{ii} = \sum \nolimits _j {{M_{ij}}} \) and the graph Laplacian \(G = D - M\) [34], the second term of Eq. (6) is further expanded as:

$$\begin{aligned} {\begin{matrix} \sum \limits _{i,j = 1}^N \left\| {{s_i} - {s_j}} \right\| _2^2{M_{ij}} = \sum \limits _{i,j = 1}^N ({s_i}^Ts_i+{s_j}^Ts_j -2{s_i}^Ts_j )M_{ij} = 2\mathrm {Tr}(SGS^T). \end{matrix}} \end{aligned}$$
(8)

As a result, Eq. (6) is written as

$$\begin{aligned} {\begin{matrix} \left. {\left\langle {S} \right. } \right\rangle = \mathop {\text {argmin}}\limits _{S} \frac{1 }{2} \left\| {X - LS} \right\| _F^2 + \beta (\mathrm {Tr}({S}GS^T)) + \lambda {\left\| {VS} \right\| _{1,1}}, \end{matrix}} \end{aligned}$$
(9)

where the second term is called graph regularization. Using variable splitting, Eq. (9) is equivalent to

$$\begin{aligned}&\mathop {\min }\limits _S \frac{1 }{2} \left\| {X - LS} \right\| _F^2 + \lambda {\left\| Y \right\| _{1,1}} + \beta (\mathrm {Tr}(SGS^T)) \nonumber \\&s.t.\quad Y = VS. \end{aligned}$$
(10)

The new formulation makes the objective function separable with respect to the variables S and Y. For Problem (10), S can also be written in a separable form as

(11)

where \(h_i = 2 \beta (\sum _{j \ne i}{G_{ij}s_j})\), and \(x_i\), \(s_i\), \(y_i\) and \(z_i\) are the i-th column of the matrices X, S, Y and Z, respectively, \(G_{ij}\) is the (ij)-th entry of the matrix G.

4 Optimization with ADMM Algorithm

After reformulating Problem (11) to unconstrained augmented Lagrangian function, it can be solved using ADMM [38]:

$$\begin{aligned} L_p(s_i,y_i,u_i)&= \frac{1 }{2} \left\| {{x_i} - L{s_i}} \right\| _2^2 +\lambda \left\| y_i \right\| _1 + \beta {G_{ii}}s_i^T{s_i}+ s_{i}^T{h_i} \\&+\,u^T_i(Vs_i-y_i) + \frac{\rho }{2} \left\| Vs_i - y_i \right\| ^2_2 \nonumber . \end{aligned}$$
(12)

The variable \(s_i\), \(y_i\), \(u_i\) are updated sequentially, with the hope that each subproblem has a closed form solution or can be calculated efficiently. In short, ADMM results in two sub-problems Eqs. (13) and (14) plus a variable update Eq. (15),

$$\begin{aligned}&\quad s_i^{(k + 1)}: = \mathop {\text {argmin}}\limits _s\,{L_\rho }(s,y_i^{(k)},{u_i^{(k)}}) = \mathop {\text {argmin}}\limits _s \frac{1 }{2} \left\| {{x_i} - L{s}} \right\| _2^2 + \beta {G_{ii}}s^T{s}+ s^T{h_i} \nonumber&\\&\quad \quad \quad \quad \quad +\,\frac{\rho }{2} \left\| Vs - y_i^{(k)} + \frac{u_i^{(k)}}{\rho } \right\| ^2_2 \end{aligned}$$
(13)
$$\begin{aligned} y_i^{(k + 1)}: = \mathop {\text {argmin}}\limits _y\,{L_\rho }(s_i^{(k+1)},y,{u_i^{(k)}}) = \mathop {\text {argmin}}\limits _y\,\lambda {\left\| y \right\| _1} + \frac{\rho }{2}\left\| {Vs_i^{(k+1)} - y + \frac{{u_i^{(k)}}}{\rho }} \right\| _2^2 \end{aligned}$$
(14)
$$\begin{aligned}&\quad u_i^{(k + 1)}:= u_i^{(k)} + \rho ( Vs_i^{(k + 1)} - y_i^{(k + 1)})&\end{aligned}$$
(15)

The update of \(s_i^{(k+1)}\) has a closed form solution, which is

$$\begin{aligned} s_i^{(k + 1)} = {P^{ - 1}}[{L^T}{x_i} - {h_i} + \rho {V^T}(y_i^{(k)} - \frac{u_i^{(k)}}{\rho })], \end{aligned}$$
(16)

where \(P=L^TL+2\beta G_{ii} I+\rho V^{T}V \). The update of \(y_i^{(k + 1)}\) involves the proximal operator of the \(\ell _1\)-norm, and its update can be expressed as:

$$\begin{aligned} y_i^{(k+1)} = \text {shrink}({Vs_i^{(k+1)} + \frac{{u_i^{(k+1)}}}{\rho }},\frac{\lambda }{\rho }), \end{aligned}$$
(17)

where the shrinkage function is defined as

$$\begin{aligned} {\text {shrink}}(v,\mu ) = \left( {\left| v \right| - \mu } \right) _+ {\text {sgn}} \left( v \right) . \end{aligned}$$
(18)

Here \((x)_+\) is x when \(x>0\), otherwise 0, and \({\text {sgn}}(v)\) is the sign of v. The shrinkage function provides an efficient way to solve the \(\ell _1\)-regularized minimization problem due to its calculation is element-wise.

The procedure for solving problem (10) is summarized in Algorithm 1. Although it is time-consuming to update all \(s_i\)’s, the separability of \(s_i\)’s suggests further improvement with the help of parallel computing techniques.

figure a

5 Numerical Experiment

A realistic head model called “New York Head” [39] and synthetic data with known ground truth is used for validation of our method. The dimension of lead field matrix is 108 by 2004 for the head model. We sampled 1 s of the data with 200 Hz frequency for each class of brain status. The number of classes is fixed to be 3. An extended common source patch for all 3 classes with 4, 6, 8 neighboring sources are assumed to be activated respectively. The task related discriminative source extents for each class also have the same number of neighboring sources activated. The magnitude of common source that represents spontaneous activation pattern is set as 0.8 with a standard derivation of 0.1 and the discriminative source is assigned to be 0.2 with a standard derivation to be 0.05. To mimic the noise from other source locations, we set the magnitude of 10 spurious sources as 0.35 or 0.5 in the simulation experiments. The spurious sources are randomly chosen in each time point but not consistent intra-class. Under the current experiment settings, the actual activated number of source can be 18, 22 and 26. According to the result in [40], the recovery rate drops quickly when the number of dipoles increases. When the number of activated sources is 20, the recovery rate is about \(40\%\) since the lead field matrix has a very high coherence across columns. The parameters \(\lambda \) and \(\beta \) in Eq. (9) is set be 0.1 and 1 respectively. Each configuration was repeated 5 times and the ground truth source localization is randomly simulated in 8 different Regions of Interest (ROI) defined in [41]. The initialization of S in ADMM algorithm was done by solving the \(\ell _1\) constrained problem using Homotopy Algorithm.

Table 1. Accuracy summary

The accuracy is based on the shortest path distance (in mm) from the ground truth location to the reconstructed source location along the cortex surface. The final results under different simulation settings are summarized in Table 1. In Table 1, PAcc represents the primary common source location accuracy (in mm), and DAcc represents the discriminative source location accuracy averaged from 3 classes. The simulation result illustrated our proposed method can perform better than all benchmark methods for discriminative source under different configurations, and also performs well for the primary common source. Moreover, from the comparison of the benchmark methods, the \(\ell _1\) based methods within MCE framework is better than the \(\ell _2\) based methods such as sLORETA and MNE. The \(\ell _2\) based methods give an overestimation of sources and ignore the task related discriminative sources. The \(\ell _1\) constrained MCE can recover both the primary source well, however, the discriminative source extent is corrupted by spurious sources. Our method can provide a sparse and precise reconstruction

Fig. 3.
figure 3

Ground truth for all 3 classes aggregated in one figure with a common source and 3 discriminative sources

Fig. 4.
figure 4

Illustration of primary source reconstruction and discriminative source reconstruction by difference methods. The first row is source solution provided by MNE, the second row is from the solution of sLORETA, the third and fourth row are DALM and FISTA method within the MCE framework, the last row is our proposed method GVB-SCCD.

of both primary source and discriminative source. For illustration, the ground truth source and reconstructed source from different algorithms are given in Figs. 3 and 4 respectively. Those illustration results are from the case when spurious source magnitude (SSM) is equal to 0.5 and the patch size is equal to 6. The brain imaging figures in Fig. 4 also show that the \(\ell _2\) based methods gave a diffuse solution and failed to recover the discriminative sources. Our proposed method can recover the primary source as well as discriminative source by encouraging intra-class consistency imposed by graph regularization.

6 Conclusion

The EEG inverse problem is usually solved independently under different brain states. In this research, we used label information to find discriminative source patches by fusing it into a graph regularization term. The proposed model called GVB-SCCD has the advantage to better find the task related activation pattern than traditional methods. The additional graph regularization can promote intra-class consistency thus eliminate the spurious sources. The proposed ADMM algorithm is given to solve the GVB-SCCD model with better performance than the benchmark methods validated in the numerical experiments. One of the drawbacks for the proposed framework is the TV term only allow smoothness for the first spatial derivative, future work will consider a smoother higher-order TV to enhance smoothness of reconstructed sources.