Introduction

Over the past 20 years, the development of functional magnetic resonance imaging (fMRI) technology has greatly advanced neuroimaging research and attracted significant interest in the human brain mapping field. Specifically, task-based fMRI has been widely employed to identify activated brain regions during a specific task performance and greatly advances the understanding of the brain’s functional locations and interactions (Bullmore and Sporns 2009; Karl J. Friston 2009; Logothetis 2008). Recently, a growing body of studies have reported the clues of diverse brain activities that exist simultaneously under specific task conditions (Michael D Fox et al. 2005; He 2013; Buxton et al. 2004). For example, Fox and the colleagues identified both task-related activations and deactivations under attention-demanding cognitive tasks (Michael D Fox et al. 2005); He (He 2013) investigated the interaction between task-evoked and spontaneous brain response activities; Lv and his colleagues (Lv et al. 2015b) reported concurrent task evoked and intrinsic (resting state) networks in specific task conditions; Buxton observed different delayed response patterns in tasks (Buxton et al. 2004). However, the current dominant task-based fMRI analysis approach, the general linear model (GLM), is still limited in detecting those diverse and concurrent brain responses sufficiently. Firstly, the premise of GLM (and other subtraction-based methods) relies on that task-evoked brain regions could be discovered by subtracting the activity from a control condition (K. J. Friston et al. 1994; Mastrovito 2013). In order to increase the signal-to-noise ratio (Mastrovito 2013; M. D. Fox and Raichle 2007), experimental and control trials are performed several times and signals are smoothed as a common practice. Such an approach enhances the task-evoked dominant response through averaging (M. D. Fox and Raichle 2007), however, largely overlooks the diversity of concurrent brain activities such as delayed task-evoked responses and intrinsic brain activities. Another potential limitation is that GLM assumes that the brain activities will follow the basic task paradigm regressor (which is derived from the convolution result of task paradigm and hemodynamic response function) and doesn’t account for the possible variation during information processing and thus the hypothesized regressors in the design matrix could be limited (Logothetis and Wandell 2004; Logothetis 2008) as well. Furthermore, there is no explicit activity patterns for intrinsic networks. Therefore, the conventional GLM method is likely insufficient in recovering diverse and concurrent brain responses during a specific task performance.

In contrast with these model-driven methods, a variety of data-driven approaches have been proposed to explore the intrinsic brain activities, including principal component analysis (PCA) (Andersen et al. 1999; Viviani et al. 2005), independent component analysis (ICA) (Mckeown et al. 1998; Biswal and Ulmer 1999), and sparse coding and dictionary learning based methods (Eavani et al. 2012; Abolghasemi et al. 2015; Jiang et al. 2015, 2016; Lv et al. 2015b; Lv et al. 2015a). Specifically, sparse representation and dictionary learning based methods (Lee et al. 2011; Abolghasemi et al. 2015; Eavani et al. 2012) have attracted increasing attention, as these methods are based on the biological findings that sparsity is more effective than independence in determining neural activity (Olshausen and Field 1996; Quiroga et al. 2005; Quiroga et al. 2008; Daubechies and Haxby 2009). Olshausen (Olshausen and Field 1996) discovered that a coding strategy of maximizing sparseness sufficiently accounts for the response properties of visual neurons. Similarly, Quiroga and his colleagues (Quiroga et al. 2008; Quiroga et al. 2005) showed that a subset of medial temporal lobe (MTL) neurons selectively and sparsely activated to different stimuli. These findings suggest that sparsity may be a potential principle in brain activity, which is quite consistent with the rationale of dictionary learning and sparse representation algorithms (Mairal et al. 2010; Wright et al. 2010) in the machine learning field. An important characteristic of these data-driven methods is that they do not require any prior hypothesized neuronal activity pattern or prior knowledge about the paradigm, and thus can be applied to resting state fMRI (Greicius et al. 2003; Fransson 2005) data analysis. That is, data-driven methods can be used to detect those intrinsic networks which do not have a predefined task paradigm. However, the popularity of purely data-driven methods is limited in tfMRI due to the lack of ability to incorporate task paradigm information into the algorithm (Calhoun et al. 2005; Zhao et al. 2015). Typically, studies in (Jiang et al. 2015) adopted the task paradigm as a reference to select the components of interest after the algorithmic procedure. Such methods, though useful, do not utilize the task paradigm information directly during the algorithmic procedure.

In order to combine the advantages of both strategies, a few researchers have proposed hybrid methods to incorporate prior information directly into the algorithmic procedure (Calhoun et al. 2005; Zhao et al. 2015). Incorporating prior information into algorithmic procedure has several potential advantages. First, components with constrained time course could be compared directly across subjects because the predefined task-related time series provide natural correspondence. Such comparisons are currently achieved by manual sorting and combined in data-driven methods (Calhoun et al. 2001). Secondly, the prior task paradigm information has been proven to be useful in determining task-evoked brain activities in model-driven methods and can guide the algorithmic procedure, thus possibly preventing data-driven components from converging to local minimum. However, a great challenge in current hybrid methods is how to sufficiently utilize the valuable prior task paradigm information for guiding the brain network identification. Similar to current model-driven methods, previous hybrid studies only adopted the basic task paradigm regressors (Calhoun et al. 2005; Zhao et al. 2015) which didn’t account for the possible variation of regressors, thus potentially limiting the task-evoked network detection power. To date, most tfMRI researches have focused on the estimation of the shapes of hemodynamic response functions (HRFs) (Glover 1999; Woolrich et al. 2004; Lindquist et al. 2007; Liao et al. 2002) across different subjects and different tasks by assuming that the generated regressor is time-invariant and does not change across experiment time courses (Marrelec et al. 2003; Lindquist et al. 2009; Kalus et al. 2015; Chen et al. 2015). Actually, these studies implied the assumption that all of the brain voxels are evoked directly by the same stimulus pattern, which may limit the interpretation of neural activities. Indeed, many studies have reported hierarchical structure in visual information processing (Felleman and Van Essen 1991; Van Essen et al. 1992; Vinckier et al. 2007) which suggests that not all of the brain voxels are evoked by the stimulus pattern directly. Instead, there is a possibility that apart from the dominant brain response activity, the neural activity propagation process may also evoke some concurrent brain activities at the same time. To alleviate these problems, in this paper, we propose a novel extendable supervised dictionary learning (E-SDL) hybrid framework to explore the diverse concurrent brain networks in tfMRI data. Compared with previous methods, there are two major contributions in this work. First, we propose to account for the regressor variations in task-evoked network identification. Our rationale is that the brain activity patterns are not only directly evoked by the stimulus patterns but also by the variations. Second, we add a regressor extension module to systematically estimate the possible neural regressors evoked by specific task stimulus and develop a uniform network identification framework for tfMRI. Specifically, we introduce the transformation operators to extend the possible regressor variation. In this way, the possible regressor variations caused by the neural activity propagation process are carefully taken into consideration and thus form a more general activation detection framework for tfMRI study. By applying the proposed E-SDL framework on publicly available task fMRI dataset of Human Connectom Project (HCP) (Glasser et al. 2013), highly correlated voxel signals with systematically extended regressors were observed and more group-wise consistent and diverse task-related brain networks as well as intrinsic connectivity networks (ICNs) were systematically identified, which demonstrates the superiority of the proposed framework in exploring diverse and concurrent brain activities under specific task performance.

Materials and methods

Overview

Figure. 1 summarizes the computational pipeline of extendable supervised dictionary learning framework. Firstly, the basic task paradigm regressor is systematically extended into regressor groups (Fig. 1a) which will be detailed in the following section. Secondly, the whole brain signals of voxels are aggregated into a big signal matrix S (Fig. 1b). After that, we adopt and modify the codes in the online dictionary learning package (Mairal et al. 2010) and alter the dictionary learning procedure to learn an over-completed and hybrid dictionary D and corresponding coefficient matrix A (Fig. 1c). Specifically, the extended regressor group is fixed in the dictionary as D c (orange part in Fig. 1c) during the dictionary learning procedure and the other part D l (green part in Fig. 1c) and the coefficient matrix A are iteratively learned from fMRI signals. Each column of D represents a temporal pattern of a specific brain response network and its corresponding coefficient vector in A represents the spatial distribution of this network. Finally, we map each vector in A back into the brain volume space to obtain the spatial distribution of each network (Fig. 1d).

Fig. 1
figure 1

The computational pipeline of extendable supervised dictionary learning framework (E-SDL). a Regressor extension. The basic task paradigm regressor is systematically extended into regressor groups to account for possible variation. b FMRI signal matrix extraction. For each subject, the whole brain signals of voxels are aggregated into a signal matrix. c Supervised dictionary learning. The extended regressor group (fixed as D c) is kept constant during the dictionary learning procedure. d Mapping the coefficient matrix back into brain volume space. Each row in coefficient matrix A is mapped back into brain volume space. T-network represents generalized task-evoked network and I-network represents intrinsic network

Dataset and preprocessing

In this paper, we adopt five independent and publicly available tfMRI datasets from the Human Connectome Project (HCP) Q1 release (Barch et al. 2013; Van Essen et al. 2013) to test the proposed framework. Specifically, the five datasets are emotion, gambling, language, relational, and social task fMRI datasets. HCP employs a broad battery of tasks to identify as many core functional “nodes” as possible in healthy adult brains and provides these data publicly available for biomedical research community (Van Essen et al. 2013; Barch et al. 2013). These functional ‘nodes’ relate to structural and functional connectivity as well as behavior measurements. Currently, the HCP tfMRI dataset provides one of the most high-quality and publicly available data source for the analysis of brain function, structure and connectivity over a large population of subjects. The detailed designs of these task paradigms are available in (Barch et al. 2013). In the Q1 release of HCP tfMRI datasets, 68 subjects are available (Van Essen et al. 2013). Therefore, our experiments are based on these 68 subjects in each task dataset.

These HCP tfMRI datasets were acquired at 3 T Siemens Skyra and the detailed acquisition parameters of tfMRI data were set as follows: 220 mm FOV, in-plane FOV = 208 × 180 mm, flipangle = 52, BW = 2290 Hz/Px, 2 × 2 × 2 mm spatial resolution, 90 × 104 matrix, 72 slices, TR = 0.72 s, TE = 33.1 ms. The preprocessing pipeline includes skull removal, motion correction, slice time correction, spatial smoothing, and global drift removal (high-pass filtering). All of these steps were implemented by FSL FEAT. For more detailed data acquisition protocol and preprocessing procedures, please refer to literatures (Van Essen et al. 2013; Barch et al. 2013). For comparison, the GLM-based activation results are derived using FSL FEAT both individually and group-wisely.

Regressor extension

In typical activation detection studies, the hypothetical brain responses (regressors) are modeled as the convolution of the stimulus function and a hemodynamic response function (HRF), as shown in Fig. 2a. Specifically, these studies are based on the assumption that all the brain voxels are evoked directly by the same stimulus pattern, which is limited in interpreting neural activities. However, many researchers have reported hierarchical structure in brain’s neural information processing (Felleman and Van Essen 1991; Vinckier et al. 2007; Van Essen et al. 1992), which suggests that the neuro activity may change with the neural activity propagation. Motivated by these studies, in this paper, we propose a novel regressor extension method based on the idea that the variety of brain responses may be caused by the transformation of dominant brain response pattern. As shown in Fig. 2b, the critical difference with previous methods is that we systematically extend the basic task paradigm regressor into a regressor group to account for possible regressor variations. The hypothesized regressor generating procedure could be modeled as follows:

$$ {r}_0( t)= h(t)\ast s( t) $$
(1)
Fig. 2
figure 2

The pipeline of modeling hypothesized regressors. a Traditional methods; b Our method

where r 0(t) is the basic task paradigm regressor which represents the dominant brain response pattern. r0(t) is the convolution result of s(t) and h(t). Specifically, s(t) is the stimulus pattern, and h(t) is the hemodynamic response function.

$$ r(t)=\left[{r}_0(t),{r}_1(t),\dots, {r}_i(t)\right] $$
(2)
$$ {r}_i(t)=\mathcal{F}\left({r}_0(t)\right) $$
(3)

where r(t) is the extended regressor groups and r i (t) represents the extended regressor with different operations.

Specifically, we first adopt Gamma function to model the hemodynamic response function of the basic task paradigm, which has been demonstrated powerful in both theory and practice (Boynton et al. 1996; Karl J Friston et al. 1998; Hossein-Zadeh et al. 2003). Then we systematically extend the basic task paradigm regressor with derivative, integral, delay and anti (inversed) operations. Our rationale behind the selection of these operations are twofold. First, these operations are well-recognized signal transformation forms in traditional signal processing community and have been proven in many studies (Richard 2003; Scali and Rachid 1998). In addition, neuroscience researchers have also observed parts of BOLD signals that are similar to these transformations of the task paradigm regressors (Buxton et al. 2004; Valdes-Sosa et al. 2009; Ritter et al. 2009). However, the basic task paradigm regressor has not been systematically extended in previous studies, as far as we know. Concretely, we first expand the regressor into its derivative and integral forms (Yellow and blue patterns of the second row in Fig. 1a). After that, we transfer the three regressors into three corresponding groups of regressors with different delay times and adopted anti (inversed) operation to the extended regressor groups. Finally, the basic task paradigms are systematically extended into regressor groups (the third row in Fig. 1a). Since the regressors in this study are extended in a systematic fashion, it may better represent the brain response patterns and provide valuable information in detecting task-evoked diverse brain activities. An important point to be noticed here is that the sparse coding and dictionary learning based method (Mairal et al. 2010) does not require the dictionary basis vectors to be orthogonal, thus allowing more flexibility to construct the hypothetical regressors and the regressor basis may be further extended with the increasing knowledge of mechanisms about neural response patterns.

Extendable supervised dictionary learning and sparse representation

In this paper, we propose an extendable supervised dictionary learning and sparse representation framework to explore diverse and concurrent brain activities under task conditions. While traditional methods assume that brain activities are evoked by the same original stimulus, E-SDL accounts for the possible neural activity propagation effects through a regressor extension procedure. Given a whole brain tfMRI signal matrix Sϵ t × n and the basic task paradigm regressor, where t is the number of fMRI time points and n represents the tfMRI signals from n voxels, E-SDL aims to extend the basic task paradigm regressor into regressor series and represent each signal in S with a sparse linear combination of atoms in an over-completed and hybrid dictionary basis D (Fig. 1c), i.e., =D × α i and S = D × A, where A =(α 1 , α 2 ,  …  . α n ) is the coefficient matrix. Specifically,

$$ \boldsymbol{D}=\left[{\mathbf{D}}_{\mathrm{c}},{\boldsymbol{D}}_l\right]\epsilon {\mathbb{R}}^{t\times k}, {\kern0.75em \mathbf{D}}_{\mathrm{c}}\epsilon {\mathbb{R}}^{t\times {k}_c}, \kern0.75em {\boldsymbol{D}}_l\epsilon {\mathbb{R}}^{t\times {k}_l\ } $$
(4)

where D c is the extended regressor series which will be fixed in dictionary learning procedure and D l is the learned dictionary atoms from signal matrix S. k c is the fixed atom number in D and k l is the learned dictionary atom number in D, respectively. In E-SDL, the empirical cost function could be modeled as the averaging loss of regression of n signals.

$$ {f\kern-0.1em }_n\left(\boldsymbol{D}\right)\triangleq \frac{1}{n}\sum_{i=1}^n\ell \left({s}_i,\left[{\mathbf{D}}_c,{\boldsymbol{D}}_l\right]\right) $$
(5)

where the loss function (s i , [D c , D l ]) is the reconstruction error of sparse representation of the signals. In order to yield a sparse representation, we add a 1 constrain as λ∣|α i |1 , 1 . Here λ is a regularization parameter which defines the regression residual and sparsity level.

$$ \ell \left({s}_i,\left[{\mathbf{D}}_c,{\boldsymbol{D}}_l\right]\right)\triangleq \frac{1}{2}\mid {\left|{s}_i-\left[{\mathbf{D}}_{\mathrm{c}},{\boldsymbol{D}}_l\right]{\alpha}_i\right|}_2^2+\lambda \mid {\left|{\alpha}_i\right|}_{1\ } $$
(6)

In order to prevent D from arbitrary value, we add constrains as follows,

$$ C\triangleq \left\{\boldsymbol{D}\epsilon {\mathbb{R}}^{t\times k}\kern0.75em s. t.\kern0.75em \forall j=1,\dots k,\kern0.5em {d}_j^T{d}_j\le 1\right\} $$
(7)
$$ \underset{\boldsymbol{D}\epsilon C,\boldsymbol{A}\epsilon {\mathbb{R}}^{k\times n}\ }{\mathit{\min}}\frac{1}{2}\mid {\left|\boldsymbol{S}-\left[{\mathbf{D}}_{\mathrm{c}},{\boldsymbol{D}}_l\right]\boldsymbol{A}\right|}_2^2+\lambda \mid {\left|\boldsymbol{A}\right|}_{1,1\ } $$
(8)

Thus the whole problem can be solved as a matrix factorization problem in Eq. (8). In order to solve this problem, we modify the codes in the online dictionary learning package (Mairal et al. 2010) and form the extendable supervised dictionary learning framework in this paper. The E-SDL framework provides an effective way to learn a hybrid dictionary and coefficient matrix which is summarized as Alg.1. In E-SDL, the convergence condition in our implementation is that the difference between \( {\mathrm{D}}_{t-1} \) and \( {\mathrm{D}}_t \) is small enough and \( {\mathrm{D}}_t \) is no longer updated. In our implementation, the dictionary size k and the time point satisfy k ≪ n and t ≪ n, which will guarantee the convergence of the training procedure (Mairal et al. 2010).

Algorithm 1 Extendable supervised dictionary learning and sparse coding framework.

figure a

Identification of brain networks.

With the preserved index information, each row vector in coefficient matrix A could be mapped back into the brain volume space as shown in Fig. 1d, representing the spatial distribution of corresponding dictionary column. Since part of D is fixed in the dictionary learning procedure, A is naturally divided into generalized task-evoked brain networks A c corresponding to D c and data-driven networks A l corresponding to D l . With the help of fixed task regressor basis D c , it is straightforward to map all the generalized task-evoked brain networks from A c for each subject in each task data. After that, we extensively inspect all the generalized task-evoked networks and examine the group-wise consistency of each network. In this way, those consistent activation networks across subjects are identified as meaningful generalized task-evoked networks.

On the other hand, these data-driven networks are learned in an unsupervised way from individual subjects. It is hard to group-wisely interpret these networks from A l . Here, a spatial matching method is thus adopted to compare the similarity between the data-driven networks in A l and the well-established intrinsic connectivity network (ICNs) templates in the literature (Smith et al. 2009) to detect intrinsic connectivity networks. The spatial similarity is defined as:

$$ R\left( X, T\right)=\frac{\left| X\cap T\right|}{\left| T\right|} $$
(9)

where X is the learned spatial network from A l and T is the ICN template.

Results

In this study, the proposed E-SDL framework has been tested on five independent and publicly available tfMRI datasets from the HCP Q1 release (Barch et al. 2013; Van Essen et al. 2013). The five datasets are emotion, gambling, language, relational, and social task datasets. For each task, both subtler group-wise consistent task-evoked brain networks as well as intrinsic connectivity networks were simultaneously identified with the proposed extendable supervised dictionary learning framework. In the following parts, the analysis of extended regressors and reconstruction error, as well as the identified task-evoked and intrinsic networks, are detailed.

Analysis of extended Regressors

In E-SDL framework, we systematically extend the basic task paradigm regressor into regressor groups with delay, derivative, integral and anti (inverse) operations, which are the most common transformation in signal propagation. Specifically, we first extend it with derivative, integral operations and then further extend them with 7 different delay time of 3 s difference. Afterwards, we extend these regressors with anti (inverse) operation. Therefore, we totally obtain 42 regressors for each basic task paradigm regressor. Here, we take emotion task dataset as an example. Fig. 3 shows the extended regressor group from one randomly selected subject in emotion task. Regressor 1–7 are the extended task paradigm regressors with different delay times. While the second column shows the extended regressors with derivative operation and different delay times, the third column shows the extended regressors with integral operation and different delay time. Columns 4–6 are the extended regressors from the left three columns with inverse operation. Thus the basic task paradigm regressor is systematically extended into regressor groups to account for possible regressor variation. In conventional model-driven methods (GLM), the basic idea to identify task-evoked network is to use multiple linear regression algorithm to search for voxels correlated with the hypothesized regressor (Grinband et al. 2008). Therefore, a meaningful hypothesized regressor should possess high correlation with part of the voxel signals in the brain. To examine the extended regressors, we calculated the Pearson correlation between the hypothesized regressors and the voxel signals in real fMRI data. Fig. 4 shows the extended regressors and their correlated voxel numbers from one randomly selected subject from the HCP emotion task. In our current experiment, if the Pearson correlation coefficient between the regressor and voxel signal pattern is over 0.50, the voxel is considered as a correlated voxel. It is interesting to see that a few hypothesized regressors show relatively large number of correlated voxel numbers, which suggests that these extended regressors are highly correlated with brain activity patterns. More detailed quantitative measurement is available in Supplemental Table 1. These highly related voxels in fMRI data suggest that it is reasonable to systematically extend regressors in such a fashion.

Fig. 3
figure 3

Systematically extended regressors of one randomly selected subject in emotion task. Each column is the extended regressors of a regressor extension operation and each row represents a different delay time

Fig. 4
figure 4

Examples of the extended regressors and correlated voxel number from one randomly selected subject in the HCP emotion task. The horizontal coordinate axis represents the extended regressor index and the vertical coordinate axis represents the correlated voxel number

Analysis of reconstruction error

In order to examine the performance of sparse representation of fMRI signals, we first calculated the Pearson correlation coefficients between the original signals and the reconstructed signals by E-SDL and GLM. Fig. 5 illustrates the Pearson correlation coefficients between the reconstructed signals and the original fMRI signals from one randomly selected subject from emotion task by different methods. From Fig. 5, it is easy to see that the reconstructed signals by E-SDL and the original signals are highly correlated and most of the Pearson correlation coefficients are higher than 0.90, which is much larger than the reconstructed signals by GLM. The detailed quantitative measurements of the reconstruction errors and Pearson correlation coefficients across all subjects and all tasks are listed in Supplemental Table 2. Besides, we adopted the reconstruction error as Eq. (10) to characterize the reconstruction error and the detailed measurement is also listed in Supplemental Table 2.

$$ error=\frac{1}{n}\mid {\left|\boldsymbol{S}-\widehat{\boldsymbol{S}}\right|}_2^2 $$
(10)
Fig. 5
figure 5

Illustration of the performance in reconstructing the fMRI signals by different methods. a The Pearson correlation coefficients between the reconstructed signals by E-SDL and the original signals. b The Pearson correlation coefficients between the reconstructed signals by GLM and the original signals

In comparison, we also calculated the residual error and Pearson correlation coefficients between the reconstructed signals and original signals across all subjects and tasks by GLM and the detailed quantitative results are listed in Supplemental Table 3. By comparing Supplemental Table 2 and Supplemental Table 3, it is easy to see that the reconstructed signals by E-SDL are much better than GLM in terms of Pearson correlation coefficients, which suggests E-SDL is better at representing the original fMRI signals.

In order to examine whether the good reconstruction result is caused by overfitting, we further conducted an experiment. Our basic idea is to utilize parts of the fMRI signals to train dictionaries to avoid overfitting. Specifically, we extracted a half and a quarter of the original signals with uniform sampling and then utilized these signals to train a dictionary basis and reconstructed the original fMRI signals. It is interesting to see that the original fMRI signals can be well reconstructed from both of the trained dictionary basis. Similar to Fig. 5, the experiment result is provided in Supplemental Fig. 5. This experiment demonstrated that the good reconstruction result is not an overfitting result.

Identified group-wise consistent diverse and concurrent brain responses

Since the basic task paradigm regressor has been extended in a systematically way, E-SDL is able to detect subtler and meaningful generalized task-evoked brain response activities, as well as intrinsic connectivity networks. Here, we also take emotion task dataset as an example. After extendable supervised dictionary learning and brain network identification procedure defined in Section ‘Identification of Brain Networks’, we identified and confirmed nine generalized task-evoked networks (Fig. 6a) and nine intrinsic connectivity networks (Fig. 6b) in emotion task as shown in Fig. 6. For each identified task-evoked network, the corresponding extended regressor as well as the extended regressor number are also shown in the upper row. Interestingly, these identified networks are quite consistent across the whole population. Fig. 7 shows the identified generalized task-evoked networks from 10 randomly selected subjects in emotion task and Fig. 8 illustrates the identified intrinsic connectivity networks from 5 randomly selected subjects in emotion task. From Figs. 7 and 8, it is evident that the identified diverse and concurrent networks are reasonable consistent across subjects in emotion task dataset.

Fig. 6
figure 6

Identified group-wise consistent brain activation networks by E-SDL framework from one randomly selected subject in emotion task. a Extended regressors and corresponding spatial patterns of identified generalized task-evoked networks. b Identified nine ICNs

Fig. 7
figure 7

Examples of identified generalized task-evoked networks in emotion task. Each column shows one extended regressor and corresponding task-evoked networks in ten randomly selected subjects

Fig. 8
figure 8

Examples of identified ICNs in emotion task. Each row shows the ICN template, group-wise ICN network across all subjects and the corresponding ICN network in five randomly selected subjects

Similarly, we applied the same procedure on the gambling, language, relational and social task fMRI datasets. Specifically, we identified 8, 10, 9 and 9 generalized and group-wise consistent task-evoked networks for gambling, language, relational and social task fMRI datasets, which are shown Supplemental Figs. 1a and b and nine group-wise consistent intrinsic connectivity networks which are shown in Supplemental Fig. 2. In Supplemental Fig. 1, each identified network represents the group-wise activation map of the corresponding extended regressor, and in Supplemental Fig. 2 each identified ICN represents the group-average intrinsic connectivity activation map. For comparison purpose, the corresponding intrinsic connectivity network templates (Smith et al. 2009) are listed in the first column.

In summary, the identified generalized task-evoked networks with the extended regressors have reasonable similar and consistent shapes across subjects in emotion task dataset, which has been shown in Fig. 6 and similar results have also been observed in gambling, language, relational and social task fMRI datasets as shown in Supplemental Fig. 1, which suggests E-SDL framework is capable of identifying subtle and diverse brain response under task conditions. At the same time, consistent intrinsic networks could also be identified across subjects (Fig. 8) and different tasks (Supplemental Fig. 2), indicating E-SDL framework could also identify intrinsic networks at the same time. Therefore, we conclude that E-SDL is efficient in exploring diverse and concurrent hemodynamic brain responses.

Comparison with GLM

We further compared the identified activation maps by GLM and corresponding generalized task-evoked networks by E-SDL (Supplemental Fig. 3). As shown in Supplemental Fig. 3, for each task, the upper row is the GLM activation map and the lower row is the corresponding networks identified by E-SDL. From Supplemental Fig. 3, we can see that the corresponding networks are quite consistent with GLM activation maps in both individual level and group level. More detailed quantitative measurements for each task are in Supplemental Table 4 and Supplemental Table 5. We adopted the spatial overlap rate defined in Eq. (9) to characterize the similarity between the GLM activation maps and corresponding networks in E-SDL. While Supplemental Table 4 describes the spatial overlap rates between the GLM activation maps and corresponding networks in E-SDL in individual level. Supplemental Table 5 characterizes the spatial overlap rate between group-wise activation maps in each task. From both spatial maps and quantitative measurements, it is interesting to see that the corresponding activation maps in E-SDL are quite consistent with GLM activation results, which indicates E-SDL identified networks includes the GLM activation detection results. However, E-SDL is more powerful in identifying diverse and concurrent brain activities under task conditions.

Comparison with ICN templates

We have also identified nine intrinsic networks across all subjects and tasks. Specifically, ICNs #1-#3 correspond to medial, occipital pole and lateral visual areas at visual cortex. ICN #4 includes ventromedial frontal cortex, bilateral inferior-lateral-parietal and medial parietal areas, which is known as “default mode network”. ICN #5 corresponds to action-execution functions, including sensorimotor cortex, supplementary motor area and secondary somatosensory cortex areas. ICN#6 is known as the auditory network including posterior insular, Heschl’s gyrus and superior temporal gyrus. ICN#7 includes anterior cingulate and paracingulate areas. ICNs #8 and #9 cover the middle frontal and orbital areas. More details are referred to (Smith et al. 2009). Supplemental Fig. 2 illustrates the identified group-wise intrinsic connectivity networks across different tasks. Supplemental Table 6 provides the spatial overlap rates between the identified ICNs and corresponding templates in across individual subjects in each task and Supplemental Table 7 provides the spatial overlap rates between the identified group-wise ICN networks and corresponding templates. We can see that the identified intrinsic connectivity networks are quite consistent across different subjects and tasks from both spatial patterns and quantitative measurements. Our results demonstrated that E-SDL framework is capable of detecting meaningful intrinsic networks as well as task-evoked networks at the same time.

Reproducibility study and parameter effect

In dictionary learning studies, the dictionary size k and the sparsity level constraint coefficient λ are two important parameters to the model. In order to verify the effect of different parameters to E-SDL framework, we adopted a variety of combinations of dictionary size k and sparsity level constraint coefficient λ. Specifically, we first fixed the dictionary size k as 400 and alternate λ from 0.1 to 1.5, and then we fixed the λ as 0.3 and alternate k from 100 to 800. The identified functional networks based on different parameter combinations of one randomly selected subject were visualized in Supplemental Fig. 4. The detailed quantitative measurements of spatial overlap rates of identified networks between different parameter combinations in the experiment are provided in Supplemental Table 8.

From Supplemental Fig. 4a-d, we can see that with fixed dictionary size, the identified task-evoked networks keep relatively stable in a range of λ settings (0.1 ≤ λ ≤ 0.7). And similarly, with a fixed λ, the identified task-evoked networks also keep stable in a range of dictionary size settings (200 ≤ k ≤ 500). Besides, the identified intrinsic networks illustrate similar result (Supplemental Fig. 4e-f). From these results and Supplemental Table 8, we can see that although there is a slight spatial variation, the overall spatial patterns reasonably remain consistent and stable in a range of parameter settings. These results demonstrated that our proposed method is reproducible and robust in identifying both task-evoked networks and intrinsic networks in a range of parameter combinations, and the framework is not very sensitive to parameter settings.

We also noticed that if the dictionary size k and sparsity level constraint coefficient λ continuously increase, the identified network components become more and more sparse. This suggests large sparsity level constraint coefficient λ and dictionary size k will improve the sparsity level of identified brain networks. However, it is still an unsolved problem to optimize these two parameters simultaneously. Based on our current knowledge, when the sparsity level constraint coefficient λ is less than 1 and the dictionary size k is between two and three times of time point number, the identified networks are usually reasonably good. Besides, the criterion of consistency of the reconstructed networks across different subjects is also adopted in previous studies (Jiang et al. 2015; Lv et al. 2015a).

Discussion and conclusion

In this paper, we proposed a novel hybrid task fMRI data analysis framework, named extendable supervised dictionary learning (E-SDL), to systematically identify and characterize diverse and concurrent hemodynamic brain response under specific task performances. A critical difference between our proposed framework and previous studies is that we systematically extended the basic task paradigm regressor into meaningful regressor groups (as shown in Fig. 3) to account for the possible regressor variations during information spreading in brain responses to stimulus. Our hypothesis is that the observed brain response activities to specific task performance are evoked not only by the original stimulus patterns but also by its variations which may be generated with the information flow in the brain. Our results have demonstrated that part of the systematically extended regressors have quite similar patterns with part of the real voxel signal patterns (as shown in Fig. 4). In addition, it is interesting to see that part of the extended regressors have quite consistent and steady spatial patterns across different subjects and tasks. All these results indicate that our hypothesis may be more powerful in identifying generalized task-evoked brain response. Furthermore, as demonstrated in results section (Figs. 6 and 7 and Supplemental Fig. 1), E-SDL also has the ability to identify intrinsic connectivity networks (Fig. 8 and Supplemental Fig. 2) simultaneously, which suggests that our proposed framework may serve as a general frame work to identify diverse and concurrent hemodynamic brain responses under specific task performance.

Compared with current dominant tfMRI data analysis method general linear model (GLM), E-SDL has demonstrated several advantages: (1) more generalized and subtle task-evoked brain networks could be identified. Benefited from the systematically extended regressors and the nature of sparse coding and dictionary learning methods that do not require the dictionary basis vectors to be orthogonal, we could not only identify dominant brain responses (GLM activation maps) but also other concurrent task-related networks with extended regressors simultaneously, which are shown in Fig. 7 and Supplemental Fig. 1; (2) better representation of the original tfMRI signals. As shown in Fig. 5, Supplemental Tables 2 and 3, E-SDL reconstructed the original signals with less reconstruction errors and much higher Pearson correlation coefficients, which suggest our framework may better represent the original signals; (3) the ability to identify intrinsic brain networks. As demonstrated in results section (Fig. 8 and Supplemental Fig. 2), E-SDL could identify consistent and meaningful intrinsic brain networks reported in (Smith et al. 2009); (4) Better robustness may be achieved. Since our systematically extended regressor naturally accounts for the possible regressor variations like different delay times, derivative form, and integral form, E-SDL has been shown to be more robust in detecting task-evoked networks, especially when the given basic task paradigm is not accurate. In summary, E-SDL offers a general and reliable framework to identify diverse and concurrent brain activities under specific task conditions.

In this paper, we have focused on evaluating the systematically extended regressors and identifying diverse and concurrent networks under specific task conditions and the experiments have demonstrated the superiority over current tfMRI data analysis methods. However, the study could be further enhanced in following aspects in the future. First of all, with the research of neural hemodynamic mechanic studies, more advanced regressor extension methods could be developed and more accurate regressors could be integrated into the proposed extendable supervised dictionary learning framework. Second, brain structure information should be integrated into future enhanced framework. Brain structure provides valuable information about the brain function nodes and response patterns, thus may further improve the activation detection results. Third, more advanced machine learning methods should be adopted in the future. Recently, deep learning methods have achieved amazing results in a variety of fields including saliency detection (Zhang et al. 2017; Zhang et al. 2016) and remote sensing imaging analysis (Yao et al. 2016; Cheng et al. 2016). However, the deep learning based methods in fMRI analysis especially network detection field is still limited and this may further improve the result.

In summary, we have proposed a novel hybrid framework to identify diverse and concurrent brain hemodynamic response patterns. With the proposed framework, subtler and concurrent task-evoked networks as well as intrinsic connectivity networks were identified, which demonstrates the superiority of our framework over current dominant tfMRI data analysis method. Motivated by these promising results, we plan to apply the proposed framework to more task fMRI datasets to further validate it and reveal the mechanics of brain responses under specific task conditions. In addition, we will apply it to brain disorder datasets such as Alzheimer’s disease and Schizophrenia to assess the possible alterations of functional interactions between different response networks in the future.