Keywords

1 Introduction

As a direct measurement modality of neural electrical firing patterns, electroencephalogram (EEG) has a higher temporal resolution up to millisecond. EEG Source Imaging (ESI) aims to map from EEG recording on the scalp to the electrical potentials in the brain sources. ESI enables high temporal resolution noninvasive connectivity analysis in the source space, which is impossible using fMIR (low temporal resolution) or intracranial EEG (invasive) [1]. Since the number of electrodes usually is much smaller than that of brain sources, the ESI problem is highly ill-posed. A variety of methods have been proposed to address this challenging problem with different neurophysiological assumptions, formulated by various regularization techniques such as \(\ell _{1}\) or \(\ell _{2}\) norm, total variation norm [2] and others summarized in [3, 4].

To encourage temporal smoothness, a number of regularization techniques based on spatiotemporal mixed norms have been developed, including Mixed Norm Estimates (MxNE) which uses \(\ell _{1,2}\)-norm regularization [5], time-frequency mixed-norm estimate (TF-MxNE) which uses structured sparse priors in time-frequency domain for better estimation of the non-stationary and transient source signal [6], and STOUT (spatio-temporal unifying tomography), which combines the advantage of Sparse Basis Field Expansions and TF-MxNE by imposing source current density into appropriate spatio-temporal basis functions.

One common limitation of the existing ESI algorithms is that they usually consider noises on the sensor level or the noise covariance in the source space. However, the spurious noise can be hard to estimate by a source covariance prior. If reconstructed, the estimated source is consisted of task-related source and spurious noise in the source space. The true task-related sources will be corrupted by spurious sources, which motivates us to develop new algorithms to find the true task-related source. There are two commonly accepted assumptions (1) spatially sparse (2) temporally continuous for the task-related source activation pattern, which inevitably leads to the low-rank property of the source space. To better discover the task-related source, we impose the low-rank penalty for source signal to estimate the latent sources with low-rank and sparse property, hoping to get rid of spurious source in the source space. Also, instead of imposing auto-regressive dynamical model, we use a nonparametric penalty term for temporal smoothness, which directly penalize dissimilarity of temporally neighboring samples. It is worth noting that we used the graph regularization term in our previous paper, however the graph is defined to be fully connected among all the points within one class [7,8,9], which inevitably drive all the activate patterns at different time points having the same magnitude, thus the dynamic behavior of the brain is oversimplified and its future application for realistic cases is limited.

In this paper, we propose a novel EEG source imaging model based on temporal graph regularized low-rank representation. The model is solved based on the alternating direction method of multipliers (ADMM) [10]. We conducted extensive numerical experiments to verify the effectiveness of discovering task related low-rank sources. The reconstructed solution is temporally smooth and spatially sparse. The contributions of our paper are summarized as follows:

  1. 1.

    A low-rank representation model (LRR) is proposed on EEG inverse problem from the low-rank property of true task-related source configurations.

  2. 2.

    We redefine the graph regularization to relieve the strong assumption in previous research and the newly defined graph regularization utilizes temporal vicinity information of samples to promote temporal smoothness.

  3. 3.

    A algorithm based on ADMM is given which is efficient extracting the low-rank task-related source patterns.

2 Inverse Problem and Temporal Graph Structures

2.1 The Inverse Problem

The cortex source electrical signal propagates to EEG sensors through a brain conductivity model which can be described as a linear mapping matrix called lead field matrix, given as follows,

$$\begin{aligned} X= LS + E \end{aligned}$$
(1)

where \(X \in {\mathbb {R}^{{N_c} \times {N_t}}} \) is the EEG data measured by \(N_c\) electrodes for \(N_t\) time points, \( L \in {\mathbb {R}^{{N_c} \times {N_d}}}\) is the lead field matrix that maps the source signal to sensors on the scalp, each column of L represents the electrical field of one source at particular location to all the EEG electrodes, \(S \in {\mathbb {R}^{{N_d} \times {N_t}}}\) represents the corresponding cortex potential in \(N_d\) sources locations for the \(N_t\) time instants. Since the number of sources is much larger than electrodes, solving S given EEG data X is ill-posed with infinite feasible solutions, which necessitates a regularization term to be imposed. Generally, an estimate of S can be done by minimizing a cost function, which is composed of a data fidelity term and a regularization term:

$$\begin{aligned} \arg \mathop {\min }\limits _S \left\| {E} \right\| _F^2 + \gamma \varTheta (S) \quad s.t. \quad X-LS = E, \end{aligned}$$
(2)

where \({\left\| \cdot \right\| _F}\) is the Frobenius Norm. The penalty term \(\varTheta (S)\) is to encourage neurophysiologically plausible explanation and guarantees a unique solution.

2.2 Temporal Graph Embedding

A graph can be viewed as geometric neighborhood relationship between each vertex representing each data sample, the weight between vertex represents similarity between two points [11]. Inspired by the manifold theory [12], we use a regularization term to penalize the difference between two neighboring source signal. In our previous work, we use a graph regularization term to promote intra-class consistency [7], but the assumption is too strong by requiring all the reconstructed sources at different time points has the same location as well as signal magnitude as long as they belong to the same class. Now define a temporal graph regularization as

$$\begin{aligned} R_t(S)=\sum \limits _{i,j = 1}^N \left\| {{s_i} - {s_j}} \right\| _2^2{W_{ij}}, \end{aligned}$$
(3)

where \(s_i\) is the i-th column of the matrix S, and a binary matrix W is designed as follows,

$$ W_{ij} = \left\{ \begin{array}{*{20}{cl}} 1, \qquad &{} \quad {\text {if }}{s_i \in N_k(s_j)}{\text { or }}{s_j \in N_k(s_i)}{} \\ 0, \qquad &{} \quad {\text {otherwise.}} \end{array} \right. $$

The graph embedding matrix W contains temporal vicinity information. \(N_k(s_i)\) is the set containing k temporally closest points to \(s_i\). In this paper, we set \(k=1\). This formulation intends to force neighboring source signal having similar pattern. The benefits are twofold, one is for temporal smoothness of the task related activated source, another advantage is to make the spurious sources denoised since their intermittent pattern will otherwise increase the cost of objective function. By defining D as a diagonal matrix whose entries are row sums of the symmetric matrix W, i.e., \(D_{ii} = \sum \nolimits _j {{W_{ij}}} \), and denoting \(G := D - W\), \(R_t(S)\) can be rewritten as:

$$\begin{aligned} R_t(S)&= \sum \limits _{i,j = 1}^N ({s_i}^Ts_i+{s_j}^Ts_j -2{s_i}^Ts_j )W_{ij} = \sum \limits _{i}^Ns_i^T s_i w_{ii} - \sum \limits _{i,j=1}^N s_i^T s_j w_{ij}\nonumber \\&= 2{{\mathrm{tr}}}(SDS^T)-2{{\mathrm{tr}}}(SWS^T) =2{{\mathrm{tr}}}(SGS^T), \end{aligned}$$
(4)

where \({{\mathrm{tr}}}(\cdot )\) is the trace operator of a matrix, i.e., adding up all diagonal entries of a matrix.

3 Proposed EEG Source Imaging Model

3.1 Decomposition of True and Spurious Sources

In general, two types of noises should be considered, one originates from inaccurate measurement of the sensors modeled by Gaussian white noise, which is denoted as E in Eq.(1), the other type of noise is called biological noise that comes directly from the spontaneous activations in the source space, which are not task-related and termed as spurious source. The second types of noise (spurious sources) contributes to the EEG signal in the same way as the truth sources. This assumption make sense since it is commonly known that under resting states, our brain still generates EEG signal. A drawback of traditional models is that they did not separate the spurious sources from the true sources. The estimated source can be composed of both task-rated source and spurious sources. To address the above-mentioned problem, we propose to use a low rank constraint to extract the task related activation. The illustration for decomposition of source space as well as the whole procedure is given in Fig. 1, where \(S_1\) has a low rank property and \(S_2\) is sparse, and the sum of \(S_1\) and \(S_2\) is no longer low-rank, making X lose low-rank structure.

Fig. 1.
figure 1

Extraction of the low-rank true source from spurious source pipeline: After gathering the MRI scans of the head, tissue segmentation is conducted followed by mesh generation. By assigning conductivity values to different tissues and electrodes co-registered with the meshing model, boundary element method (BEM) was used to solve the forward model. Each triangle represents a brain source. The factual source signal S can be decomposed into two source matrix. The task related true sources \(S_1\) have a low-rank property and the spurious sources \(S_2\) are the sparse but not temporally consistent. The low-rank source solution is projected to cortex voxels to illustrate the activation pattern.

3.2 Low Rank Representation Model with Graph Regularization

We introduce our proposed model called Low-Rank Representation with Temporal Graph structures ESI (LRR-TG-ESI). The model is composed of data fitting term to explain the EEG data, temporal graph embedding regularization term that promotes temporal smooth, and a \(\ell _1\) norm for sparsity penalty and nuclear norm for the low-rank structure of the true source. The objective function is given below:

$$\begin{aligned}&\min _{S,E} ||S||_ * + \lambda ||E||_{1,1} + \beta ||S||_{1,1} +\alpha {{\mathrm{tr}}}(S GS^T) \nonumber \\&\mathrm {s.t.}\quad X=LS+E, \end{aligned}$$
(5)

where \({\lambda , \beta , \alpha >0}\) are tuning parameters to balance the trade-off of different terms. Our proposed model is able to enforce row-sparse via low-rank and sparse regularization and temporal smoothness via temporal graph regularization while fitting the EEG data X. Although the graph regularization term has been discussed in our early paper [7], it is not defined on the temporal manifold, and the previous definition in [7] made a strong assumption to drive the magnitude of source signal to be similar intra-class. To promote the spatial smoothness, a total variation term can be imposed as another penalty term, such as first order total variation (TV) regularization in Ref. [2, 13], fractional order TV in [8, 14], and similar algorithm can be derived under the framework of ADMM, however further investigation of using spatial smoothing TV is our future work.

4 Optimization Algorithm

To solve (5), an algorithm in the framework of ADMM is developed. The augmented Lagrangian function of (5) is

$$\begin{aligned} L(S,&M,E,T_1,T_2,\mu )= ||S||_ * + \lambda ||E||_{1,1} +\beta ||M||_{1,1} +\alpha {{\mathrm{tr}}}(S GS^T) \nonumber \\&+ \left\langle T_1,X - LS-E \right\rangle +\left\langle T_2,M-S\right\rangle + \frac{\mu }{2} \times (||X- LS-E||_F^2+||M-S||_F^2 ). \end{aligned}$$
(6)

By some simple algebra, (6) can be reformulated as

$$\begin{aligned} L(&S,M,E,T_1,T_2,\mu )= ||S||_ * + \lambda ||E||_{1,1} +\beta ||M||_{1,1} +\alpha {{\mathrm{tr}}}(S GS^T) \nonumber \\&+\frac{\mu }{2} \times ( ||X- LS-E + \frac{T_1}{\mu }||^2_F + ||M-S+ \frac{T_2}{\mu }||^2_F ) - \frac{1}{2} \mu (||T_1||^2_F + ||T_2||^2_F ), \end{aligned}$$
(7)

where \(T_1\) and \(T_2\) are Lagrangian multipliers and \(\mu \) is a positive scalar which can be used as a step size. M is an auxiliary variable for S. The inner product of two arbitrary matrices A and B is denoted as \(\left\langle A, B \right\rangle \), which is also equal to \({{\mathrm{tr}}}(A^TB)\). To minimize Eq. (7), the variables S, M, E, \(T_1\) and \(T_2\) can be updates alternately in a Gauss-Seidel manner by minimizing the augmented Lagrangian function with other variables fixed. For symbolic simplicity, we rewrite Eq.(7) as:

$$\begin{aligned} L(&S,M,E,T_1,T_2,\mu )= ||S||_ * + \lambda ||E||_{1,1} +\beta ||M||_{1,1} \nonumber \\&+ h(S,E,M,T_1,T_2,\mu ) - \frac{1}{2} \mu (||T_1||^2_F + ||T_2||^2_F), \end{aligned}$$
(8)

where

$$\begin{aligned} h(S,E,M,T_1,T_2,\mu ) =&\alpha {{\mathrm{tr}}}(S GS^T) + \frac{\mu }{2} \times ( ||X- LS -E + \frac{T_1}{\mu }||^2_F \nonumber \\&+ ||M - S + \frac{T_2}{\mu }||^2_F ). \end{aligned}$$
(9)

If the augmented Lagrangian function is difficult to minimize with respect to a variable, a linearized approximate surrogate function can used, hence the algorithm we used here bears the name Linearized Alternating Direction method [11, 15]. Updating S by minimizing \(h(S,E^k,M^k,T_1^k,T_2^k,\mu ^k) \) (suppose we are at iteration k) is equivalent to minimize the following goal function with the other variables fixed:

$$\begin{aligned} L_{S} = ||S||_ * + h(S,E^k,M^k,T_1^k,T_2^k,\mu ^k), \end{aligned}$$
(10)

which is approximated by optimizing its linearizion at \(S^k\) plus a quadratic proximal term:

$$\begin{aligned} S = \mathop {{{\mathrm{argmin}}}}\limits _{S} ||S||_* + \left\langle \nabla _{S}h(S^k), S-S^k \right\rangle + \frac{\eta }{2}\left\| {{S} - S^k} \right\| _F^2. \end{aligned}$$
(11)

Here \(\eta \) is a constant satisfying

$$\begin{aligned} \eta > 2\alpha ||G||_2 + \mu (1+||L||^2_2), \end{aligned}$$
(12)

where \(||\cdot ||_2\) is the spectral norm of a matrix, i.e, the largest singular value. As long as (12) is satisfied, (11) is a good approximate to (10). The solution to (11) has a closed form using a singular value thresholding operator (SVT) [16] given as:

$$\begin{aligned} S^{k+1} = \varTheta _{\eta ^{-1}}(S^k -\nabla _{S}h(S^k) /\eta ), \end{aligned}$$
(13)

where \(\varTheta _\varepsilon (A) = US_\varepsilon (\varSigma )V^T\) is the SVT operator, in which \(U\varSigma V^T\) is the singular value decomposition of A and \(S_\varepsilon (s)\) is defined as \(\sin (x)\max (|x|-\varepsilon ,0)\). \(\nabla _{S_1}h(S_1^k)\) is calculated as

$$\begin{aligned} \nabla _{S}h(S^k)&=\alpha (S^kG +S^kG^T) + \mu L^T(LS -X+E-\frac{T_1}{\mu }) + \mu (S -M-\frac{T_2}{\mu }) \end{aligned}$$

To update M and E, it is equivalent to solve the following problem:

$$\begin{aligned} \mathop {{{\mathrm{argmin}}}}\limits _{M} \frac{\mu }{2} ||M-S+ \frac{T_2}{\mu }||^2_F + \beta ||M||_{1,1} \end{aligned}$$
(14)
$$\begin{aligned} \mathop {{{\mathrm{argmin}}}}\limits _{E} \frac{\mu }{2} ||X- LS-E + \frac{T_1}{\mu }||^2_F + \lambda ||E||_{1,1} \end{aligned}$$
(15)

The general form of (14)–(15) is a \(\ell _1\) norm proximal operator defined as

$$\begin{aligned} {\text {pro}}{{\text {x}}_\mu }(V) = \arg \mathop {\min }\limits _X \mu ||X||_{1,1} + \frac{1}{2}|| {X - V} ||_F^2, \end{aligned}$$
(16)

with \(\mu >0\). The above problem (16) has a closed form solution, called soft thresholding, defined by a shrinkage function,

$${\text {shrink}}(V,\mu ) = \left( {\left| V \right| - \mu } \right) _+ {\text {sgn}} (V),$$

where \((x)_+\) is x when \(x>0\), otherwise 0. The shrinkage function is defined as element-wise operator. Problem (14)–(15) has a close form solution described with the shrinkage function. After updating all the variables, these Lagrange multipliers are updated by

$$\begin{aligned} \begin{aligned} T_1 =T_1+\mu (X- LS-E) , \quad T_2 =T_2+\mu (M-S). \end{aligned} \end{aligned}$$
(17)

The parameter \(\mu \) is updated by \(\mu = \min (\rho \mu , \mu _{max}) \). A summarized algorithm is given as Algorithm 1. We initialize the S with the estimate \(S_0\) from \(\ell _{1}\) solver.

figure a

It’s worth noting that the data fitting term we use is \(\ell _{1,1}\) norm of E in the model, and there are other options. Generally, if the Gaussian noise E is small, then the norm \(||E||_F\), is an appropriate choice, but for random data corruption, \(\ell _{1,1}\) should be used, and for sample specific data corruption, \(\ell _{2,1}\) [17,18,19], should be used.

The above procedures are summarized in Algorithm 1. The convergence of Algorithm 1 can be easily derived from [15].

5 Numerical Experiments

In this section, we conducted 2 experiments to illustrate the effectiveness of our proposed method. In the beginning, we tested different values for \(\lambda \) and \(\beta \) by setting \(\alpha =0\). Later, we illustrate the temporal smoothing functionality of the graph regularization term for corrupted source with abrupt signal jumps. In the second experiment, we give comprehensive numerical results by testing our algorithm against the benchmark algorithms to showcase the effectiveness of the proposed method in reconstructing task-related source, where we show that our algorithm can not only find the activated locations, but also reconstruct the time-course of source activation with high precision.

5.1 Experiments 1: Test LRR with Temporal Graph Prior

At each location, a time series with length of 500 were generated to represent the source activation time-course. At each time point, two randomly picked sources are activated to simulate the non-task related spurious noise with mean of 0 and variance to be 1. The task-related activate pattern has low-rank property, however, the noise corrupted source space is no longer low-rank. We repeated our experiment 50 times for all the combinations of \(\lambda \) and \(\beta \), where \(\lambda = \{0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.5 \}\) and \(\beta =\{0.005, 0.01, 0.015, 0.02, 0.05, 0.1\}\). The reconstructed error (RE) metric used here is \(\text {RE}=||\hat{S} -S||_2/||S||_2\). The RE under different value of \(\lambda \) and \(\beta \) is given in Fig. 2. Next, we solve the LRR-TG-ESI problem (5) with graph regularization term to test its impact on the reconstructed signal. We assign different values \(\{ 0.01,0.02, 0.05,0.1,0.5\}\) for the graph regularization parameter \(\alpha \). The original source signal was smooth, then it was corrupted by random noise at some time points. There are also 2 randomly picked activated sources representing spurious sources with the mean of 0 and variance to be 1. The “temporal smoothing” impact of the graph regularization is shown in Fig. 3, where \(\lambda =0.02\) and \(\beta = 0.01\). In Fig. 3, the original signal is corrupted and not smooth at some time points, we set the neighbor size to be 2 (the closest signal before and after the one to be estimated) when calculating the Laplacian matrix. It is evident from the formulation (3) that the graph regularization term will decrease the dissimilarity of the temporally neighbored reconstructed source. If \(\alpha \) is set to be 0.5, the graph regularization term penalized heavily on the curvature of the reconstructed signal as is illustrated in Fig. 3. We can see that with the temporal graph prior, the reconstructed source is more smooth. It is worth noticing that the main purpose of temporal graph prior is not to smooth the time course for the activated locations, the main purpose is to filter out the spurious activations that are short transients with abrupt jumps. Combined with the low-rank prior, the temporal graph prior can filter the spurious activations and reconstruct the task related activated source. The randomly planted spurious sources are filtered out by penalizing the graph regularization and nuclear norm, and in most of the cases, the final rank is 2 can be achieved within a wide range of parameters.

Fig. 2.
figure 2

Averaged reconstruction error and rank varying \(\lambda \) and \(\beta \) over 50 experiments. Average of reconstruction error for different \(\lambda \) and \(\beta \)

Fig. 3.
figure 3

Illustration of the smoothing effect of temporal graph regularization: reconstructed time courses from varied graph regularization parameters.

5.2 Experiments 2: Comprehensive Comparison with Benchmark Algorithms

The purpose of previous numerical experiments is to validate each term of the objective function and to understand their properties. The trade-off between low-rankness, data fidelity, sparsity, temporal smooth is fully discussed by varying different parameters. In this part, a comprehensive study is conducted to compare the proposed algorithm with the popular ESI algorithms such as MNE [20], sLORETA [21], and MCE with implementation of Homotopy and FISTA [7]. We generated independent sources in different ROIs for easy validation purpose, the number of independent sources of 2 and 3 corresponding to different rank of ground-truth source, and the spurious sources are generated by randomly activating the sources on the cortex with a random scalar whose mean value to be 0, and the variance is 1. Moreover, the noise on sensor level is also added to the EEG data. Two of the MCE algorithms are selected, which are Homotopy and FISTA [22].

To measure the performance, we introduced 5 metrics, including (1) CPU time in seconds, (2) rank of the calculated source, (3) Sparsity, measuring the number of nonzero elements in the source space at each time point, (4) Reconstruction Error (RE) defined in Sect. 5.1, (5) Localization Error (LE), which is calculated using the shortest path algorithm over the irregular meshes from the reconstructed source location to the ground truth location. The LE metric is the most important one since it measures the discrepancy in location, the other metrics give information of the property of the rendered solution. To calculate LE for each ROI with activated sources, we first locate the source with the largest activation magnitude in this ROI, and calculate the shortest path distance from the located source to the ground truth location. We conduct the same procedure for all the activated ROI, and calculate the average value of all the distances at each time point. The final LE is the averaged distance value for all the 500 time points for each experiment.

For our proposed algorithm, we set \(\lambda = 0.01\) and \(\beta = 0.01\), which were tested to have good performance for the same case when the rank is 2 and the number of the spurious activated source is 2 in previous experiment, and the graph parameter \(\alpha \) is also set to be 0.01. 10 experiments were conducted under the same setting and the performance of all the algorithms are summarized in Tables 1 and 2 when true rank is 2 or 3. The SNR is calculated after the noise signal is generated and it was averaged from 10 experiments under the same experimental setting. As can be seen from the tables, our algorithm is the most accurate to locate the task related activated source. The CPU time of our algorithm is between Homotopy and FISTA algorithm.

Table 1. Source reconstruction performance comparison (Rank = 2)
Table 2. Source reconstruction performance comparison (Rank = 3)

6 Conclusion

In this paper, unlike the traditional model, we propose to estimate the latent source which is task-related but corrupted with spurious sources. To extract the discriminative task related source activation pattern, we come up with a new EEG source imaging model based on temporal graph structures and low-rank representation. The model is solved with an algorithm based on ADMM. Numerical experiments verified the effectiveness of the proposed work on discovering task related low-rank sources.