1 Introduction

In the past two decades, functional neuroimaging has become an important tool for studying various neural mechanisms in the brain. In particular, functional magnetic resonance imaging (fMRI) has drawn considerable attention due to being noninvasive and having high spatio-temporal resolution [1,2,3]. Many methods have been used for fMRI data analysis, and these methods are generally divided into model-based methods and data-driven methods [4,5,6]. The model-based methods require a priori information about the experimental paradigm, and usually only local brain area data are considered in these methods rather than data from the whole brain. On the contrary, data-driven methods do not depend on any a priori information, and several of these methods have been fruitfully applied to the field of fMRI data analysis, such as principle component analysis (PCA), independent component analysis (ICA), and clustering analysis (CA).

Among them, the purpose of ICA is to decompose the observed multivariate data into the source signals, which are assumed statistically independent and non-Gaussian. Since it was first introduced into this field for single-subject fMRI data analysis [7], it has become one of the most popular methods to analyze fMRI data [8,9,10]. Compared with univariate methods such as general linear model (GLM) methods based on the single voxel level [11], ICA is a multivariate method that considers the interactions between the voxels and is increasingly being applied to extract functional neural networks from the fMRI data of various cognitive activities without relying on any a priori information. Currently, ICA has been widely used for fMRI data analysis in a resting state [12, 13] or a specific cognitive task-related state [14, 15]. In many circumstances, ICA needs to be used for multiple-subject fMRI data analysis, which is usually referred to as group ICA (GICA) [16].

As a kind of purely data-driven blind source separation technique, ICA does not require any a priori information. However, many studies have shown that the capabilities of ICA can been greatly improved if some a priori information is incorporated into the estimation process when it is available [17,18,19]. This kind of method is usually called constrained ICA (CICA) or ICA with reference (ICA-R) [20, 21]. Compared with the classical ICA, CICA or ICA-R only extracts the sources of interest without extracting all the sources by introducing the a priori information into the calculation process, and this method avoids computing uninteresting sources, facilitates subsequent applications, and reduces the computation time and storage requirements [22, 23]. In addition, the separation quality and accuracy of interesting sources can also be improved through the incorporation of a priori information [24, 25]. On this basis, a great number of other extension methods have been proposed [26,27,28,29].

Currently, the existing methods with a priori information generally consider specific knowledge associated with the sources. For example, the spatial template of some mature networks, such as the visual network or the default mode network, and the specific experimental paradigms of some cognitive task experiments, such as block stimulation mode in a visual cognitive experiment, are considered [30,31,32]. However, this knowledge about the sources is not always known, especially in the case of complex cognitive activity. It is important to get available a priori information from the existing data itself. Recently, we proposed a method called GICA-IR to extract some intrinsic, spatial a priori information from data from groups of subjects. The results demonstrated that the group independent component (GIC) computed by GICA-IR is more representative of the commonality of the subjects in the group through incorporating this information into the GICA extraction procedure [33].

However, as far as we know, there are very few papers that study how to obtain temporal a priori information from the subjects in a group for multi-subject fMRI data analysis. In this paper, we propose a novel method to extract the temporal a priori information from the data from groups of subjects and then incorporate it into a GICA computational process for group fMRI data analysis using the improved multi-objective optimization-based CICA method. The experimental results showed that the GIC computed by the improved CICA method is more representative of the commonality of subjects in the group.

2 Methods

In this section, we first briefly introduce the relevant knowledge about ICA and GICA. Then, a detailed description of the proposed improved CICA method with temporal reference signal is presented. Finally, we provide a description of the experimental data and data processing.

2.1 Independent component analysis

Assume X = (x 1, x 2, …, x T ) is a T × V matrix of observed fMRI data from a single subject, where T and V represent the number of time points and the number of voxels within the brain, respectively. Then, the classical spatial ICA can be formulated as the following linear generative model:

$$ \boldsymbol{X}=\boldsymbol{MS} $$
(1)

where S = (s 1, s 2, …, s N ) is an N × V matrix in which each row represents a spatial source and N denotes the number of sources. These sources are assumed to be unobservable, independent, and non-Gaussian. M is a T × N unknown mixing matrix that mixes the N sources to generate the observed fMRI data, whose columns contain the associated time courses of the N source signals. Solving the ICA is estimating an N × T unmixing matrix W = (w 1, w 2, …, w N ) such that Y = (y 1, y 2, …, y N ) is a good approximation of the sources S according to the following equation:

$$ \boldsymbol{Y}=\boldsymbol{WX} $$
(2)

Many algorithms can be used to solve the ICA model in (1), and currently the most widely used ICA algorithms include InfoMax [34] and FastICA [35].

2.2 Group independent component analysis

ICA was implemented on the multi-subject fMRI data, often referred to as GICA. Temporal concatenation GICA (TCGICA) is the most widely used method in the existing GICA approaches, which assumes that all subjects have common, spatially independent components (ICs). Specifically, assuming that there are K subjects in total, and X i is a T × V matrix that represents the fMRI data of each subject i (i = 1,  ⋯ , K), the TCGICA concatenates the fMRI data of K subjects along the temporal dimension and then decomposes the KT × V group data as follows:

$$ \widehat{\boldsymbol{X}}={\left({\boldsymbol{X}}_1^{\prime },{\boldsymbol{X}}_2^{\prime },\cdots, {\boldsymbol{X}}_K^{\prime}\right)}^{\prime }=\widehat{\boldsymbol{M}}\widehat{\boldsymbol{S}} $$
(3)

where \( \widehat{\boldsymbol{M}} \) is a KT × L group mixing matrix, \( \widehat{\boldsymbol{S}} \) is a L × V matrix in which each row represents a GIC, and L denotes the number of GICs.

2.3 The improved CICA with temporal reference signal

In this subsection, we first give the specific steps of the method for extracting the temporal reference signal from the group of subjects, and then we present a detailed description of the improved multi-objective optimization-based CICA method.

2.3.1 The extraction of the temporal reference signal

We assume there are a total of K subjects in the group, and all subjects have T time points and V voxels after normalization. First, we implemented ICA on each of the subjects in the group. For each subject i, ICA was defined as

$$ {\boldsymbol{X}}_i={\boldsymbol{M}}_i{\boldsymbol{S}}_i,\left(i=1,2,\dots, \mathrm{K}\right) $$
(4)

where X i is a T × V fMRI observed data, \( {\boldsymbol{S}}_i={\left({\boldsymbol{s}}_{11},{\boldsymbol{s}}_{12},\dots, {\boldsymbol{s}}_{1{N}_i}\right)}^{\prime } \) is an N i  × V matrix, and each row represents an IC of subject i. \( {\boldsymbol{M}}_i=\left({\boldsymbol{m}}_{11},{\boldsymbol{m}}_{12},\dots, {\boldsymbol{m}}_{1{N}_i}\right) \) is a T × N i mixing matrix. For the sake of simplicity, we only considered the case where each subject has just one IC of interest, and the correspondence of the ICs of different subjects could be obtained by the absolute value of the spatial correlation [36].

Now we denoted \( {\boldsymbol{s}}_{i{n}_i}\left(i=1,2,\dots, \mathrm{K}\right) \) as the n i st IC, which is the source of interest for the subject i, and \( {\boldsymbol{m}}_{i{n}_i}\left(i=1,2,\dots, \mathrm{K}\right) \) is the corresponding time course. Finally, these time courses were concatenated into a time course with greater length:

$$ \boldsymbol{m}={\left({{\boldsymbol{m}}_{1{n}_1}}^{\prime },{{\boldsymbol{m}}_{2{n}_2}}^{\prime },\dots, {{\boldsymbol{m}}_{K{n}_K}}^{\prime}\right)}^{\prime } $$
(5)

where m represents the temporal reference signal, which is a column vector of size KT × 1, and it is used as rt i in the following improved multi-objective optimization-based CICA method for group fMRI data analysis.

2.3.2 The improved multi-objective optimization-based CICA method

In this paper, the proposed improved CICA with temporal reference signal method was established with the multi-objective optimization framework as follows:

$$ {\displaystyle \begin{array}{c} Maximize\ \left\{\begin{array}{c}J\left({\boldsymbol{w}}_{\boldsymbol{i}}\right)\approx {\left\{E\left[G\left({\widehat{\boldsymbol{s}}}_{\boldsymbol{i}}\right)\right]-E\left[G\left(\boldsymbol{v}\right)\right]\right\}}^2\\ {}{\varepsilon}_1\left({\boldsymbol{w}}_{\boldsymbol{i}}\right)= abs\left(E\left[{\widehat{\boldsymbol{m}}}_{\boldsymbol{i}}\bullet {\boldsymbol{rt}}_{\boldsymbol{i}}\right]\right)\end{array}\right.\\ {} Subject to\ {\left\Vert {\boldsymbol{w}}_{\boldsymbol{i}}\right\Vert}^2=1\end{array}} $$
(6)

where J(w i ) is the negentropy of the estimated IC, \( {\widehat{\boldsymbol{s}}}_{\boldsymbol{i}}={\boldsymbol{w}}_{\boldsymbol{i}}^{\boldsymbol{T}}\widehat{\boldsymbol{X}} \). \( {\widehat{\boldsymbol{m}}}_{\boldsymbol{i}}={\boldsymbol{Z}}^{-1}{\boldsymbol{w}}_i \) denotes the time course corresponding to \( {\widehat{\boldsymbol{s}}}_{\boldsymbol{i}} \), and it is a column vector of size KT × 1. Z denotes the L × KT whitening matrix, which is obtained by eigenvalue decomposition, and L denotes the number of ICs. v is a Gaussian random variable with a zero mean and a unit variance. G(∙) is a non-quadratic function, and G(v) = log(cosh(v)) is used in this paper. rt i denotes a temporal reference signal, which is a column vector of size KT × 1, and \( {\varepsilon}_1\left({\boldsymbol{w}}_{\boldsymbol{i}}\right)= abs\left(E\left[{\widehat{\boldsymbol{m}}}_{\boldsymbol{i}}\bullet {\boldsymbol{rt}}_{\boldsymbol{i}}\right]\right) \) is specifically defined as the Pearson correlation coefficient to measure the closeness between \( {\widehat{\boldsymbol{m}}}_{\boldsymbol{i}} \) and rt i where both \( {\widehat{\boldsymbol{m}}}_{\boldsymbol{i}} \) and rt i have a zero mean and unit variance. Each solution of the multi-objective optimization problem in (6) corresponds to an optimal unmixing column vector w i that is constrained to ‖w i 2 = 1.

For multi-objective optimization problems, there is no global solution that makes all of the cost functions achieve the optimum simultaneously. Therefore, a trade-off solution is needed to balance the optimality of all cost functions. Among the methods of solving the multi-objective optimization problem, the weighted summing method is simple and efficient, and it is achieved by optimizing the weighted sum function of the objective functions on the condition that the weight value of each objective function is positive and the sum of all weights is 1 [37]. Therefore, this method was adopted to solve the multi-objective optimization problem of (6) in our study. To avoid the calculation process being controlled by the objective function with a larger value, the arc-tangent function was used to normalize the objective function J(w i ) in (6):

$$ {f}_1\left({\boldsymbol{w}}_i\right)=\left(2/\pi \right)\bullet \arctan \left[{c}_i\bullet J\left({\boldsymbol{w}}_{\boldsymbol{i}}\right)\right] $$
(7)

where c i in (7) is automatically determined so that the possible values of f 1(w i ) and ε 1(w i ) range from 0 to 1 [36]. Then, the reformulated linear weighted objective function is

$$ f\left({\boldsymbol{w}}_i\right)={a}_1\bullet {f}_1\left({\boldsymbol{w}}_i\right)+{a}_2\bullet {\varepsilon}_1\left({\boldsymbol{w}}_{\boldsymbol{i}}\right) $$
(8)

where a i  (i = 1, 2) is the weight parameters and a 1 + a 2 = 1. Then, the iteration algorithm for optimizing f(w i ) can be derived as follows:

$$ \nabla f\left({\boldsymbol{w}}_i\right)={a}_1\bullet \nabla {f}_1\left({\boldsymbol{w}}_i\right)+{a}_2\bullet \nabla {\varepsilon}_1\left({\boldsymbol{w}}_{\boldsymbol{i}}\right)=2{a}_1\bullet \left(2/\pi \right)\bullet {c}_i/\left(1+{\left[{c}_i\bullet J\left({\boldsymbol{w}}_i\right)\right]}^2\right)\bullet \left(E\left[G\left({\boldsymbol{w}}_i^T\widehat{\boldsymbol{X}}\right)\right]-E\left[G\left(\boldsymbol{v}\right)\right]\right)\bullet E\left[\widehat{\boldsymbol{X}}\bullet g\left({\boldsymbol{w}}_i^T\widehat{\boldsymbol{X}}\right)\right]+{a}_2\bullet E\left[{\boldsymbol{Z}}^{-1}{\boldsymbol{rt}}_i\right] $$
(9)

where g(∙) is the derivative of G(∙), sog(v) = tanh(v). E(∙) can be estimated as the mean of all samples. Once the gradient of the weighted sum function in (8) is calculated, the steepest ascent iteration formula can be set up as follows:

$$ {\boldsymbol{w}}_i\left(k+1\right)={\boldsymbol{w}}_i(k)+\mu (k)\bullet {d}_i(k) $$
(10)

where w i (k) denotes the value of w i after the kth iteration. d i (k) = ∇f(w i (k))/‖∇f(w i (k))‖, and μ(k) denotes the step-length. Finally, the corresponding time course can be calculated by using the following formula when the IC \( {\widehat{\boldsymbol{s}}}_{\boldsymbol{i}} \) is obtained:

$$ {\widehat{\boldsymbol{m}}}_{\boldsymbol{i}}={\boldsymbol{Z}}^{-1}{\boldsymbol{w}}_i $$
(11)

where w i is the unmixing column vector corresponding to \( {\widehat{\boldsymbol{s}}}_{\boldsymbol{i}}. \)

In the last subsection, the whole process for the proposed method is summarized as a flowchart (see Fig. 1).

Fig. 1
figure 1

The flowchart of the proposed method. X i  (i = 1, 2, …, K) represents the fMRI data of subject i. M i  (i = 1, 2, …, K) and S i  (i = 1, 2, …, K) represent the temporal and spatial components of subject i, respectively, which are obtained using ICA. m represents the temporal reference rt i . \( \widehat{\boldsymbol{M}} \) and \( \widehat{\boldsymbol{S}} \) represent the group temporal and spatial components, respectively, which are obtained using the improved CICA method

2.4 Experimental data

In this subsection, the simulated and real fMRI data were used to evaluate the performance of the improved CICA method at the group level for fMRI data analysis.

2.4.1 Simulated data

The simulated fMRI data were obtained using the code downloaded from http://mlsp.umbc.edu/simulated_fmri_data.html [38], where a set of spatial sources with different simulated hemodynamic time courses were designed to generate the data through linear superposition. Specifically, the data for each simulated subject were produced by mixing the eight original sources with their corresponding time courses, and a total of 100 sample images were included in the data, where each source image had 60 × 60 pixels with 100 time points (see Fig. 2). The eight original sources were designed such that source 1 was task related, source 2 and source 6 were transiently task related, source 5 was function related, and source 3–source 4 and source 7–source 8 were artifact related. In particular, the task-related source 1 had a time course similar to the block-like shape that is often used to imitate an experimental paradigm.

Fig. 2
figure 2

The simulated original sources and their corresponding time courses, which are all normalized to have a mean of zero and a unit variance. Specifically, source 1 represents task-related information, source 2 and source 6 represent transiently task-related information, source 5 represents function-related information, and the others four sources (source 3–source 4 and source 7–source 8) represent artifact-related information

In this experiment, 20 groups of simulated datasets were produced from the same original sources/TCs by adding specific variability to each subject, and each simulated dataset included five subjects. The spatial variation in the sources of each subject was portrayed by adding Gaussian noise with a different signal-to-noise ratio (SNR) to the source images, and the signal-to-noise ratios ranged from 0.3 to 0.4 and were randomly determined for different subjects. The temporal variation in time courses was simulated by applying time delay and amplitude modulation, which were also randomly determined with the time courses.

2.4.2 Real fMRI data

A real fMRI dataset from five subjects who completed a visual task was included in this study. All five subjects were notified about the aim of this study, and they signed a written consent letter. The block pattern of OFF-ON-OFF-ON-OFF-ON was used as the experimental paradigm, and each block lasted 20 s. In the “ON” state, the visual stimulus corresponded to a radial blue/yellow checkerboard that reversed at 7 Hz. In the “OFF” state, the participants were required to focus on a cross at the center of the screen. The BOLD fMRI data from two subjects were acquired using single-shot SENSE gradient echo EPI with 37 slices, providing whole-brain coverage and 70 volumes, a TR of 2.0 s, and a scan resolution of 64 × 64. The in-plane resolution was 4 mm × 4 mm, and the slice thickness was 4 mm. The other three subjects were acquired using single-shot SENSE gradient echo EPI with 40 slices, providing whole-brain coverage and 70 volumes, a TR of 2.0 s, and a scan resolution of 80 × 80. The in-plane resolution was 3 mm × 3 mm, and the slice thickness was 3 mm.

2.5 Data processing

All of the calculations in this study were implemented on a workstation whose operation system platform was Windows 7 Unlimited Service Pack 1, with an Intel(R) Xeon(R) E5-1620 3.60 GHz processor and 40 GB RAM. The preprocessing and calculation steps from FastICA and the improved CICA methods were run using Matlab (Matlab, 2012b, MathWorks Inc., Sherborn, MA, USA) [39].

The preprocessing steps of the real-data experiment were implemented using SPM8 software (http://www.fil.ion.ucl.ac.uk/spm/), which included slice timing, motion correction, spatial normalization, and smoothing with a Gaussian kernel of 8 mm. In all experiments, FastICA, the newly published method [39] (which is denoted as CICA in this paper), and the improved CICA method were used. Specifically, FastICA was implemented using GIFT software (v2.0e) (http://mialab.mrn.org/software/) for the purpose of comparison. Moreover, ICASSO [40] with 20 runs of ICA was used to obtain reliable ICs, and MDL [41] was used to estimate the number of ICs. Furthermore, the positioning and display of the spatial networks were implemented using MRIcro software (http://www.mricro.com).

3 Results

In this section, the advantages of the improved CICA method are demonstrated by comparing the experimental results obtained with the improved CICA method with those obtained with FastICA and CICA for the group fMRI data analysis. Specifically, the spatial a priori information in CICA was obtained using a previously described method [33], and a detailed description is presented in Appendix 1.

First, a power analysis of the receiver operating characteristic (ROC) curve was adopted to evaluate the spatial detection ability of these methods in the simulated experiment, which is denoted by the area surrounded by the ROC curve, and a larger area under the curve (AUC) is usually better [42]. Second, the correlations among the time courses computed by FastICA, CICA, and the improved CICA methods with the true time course were used to measure the temporal performance, which can be calculated using the following formula:

$$ corrcoef\_ TC= abs\left(\mathit{\operatorname{cov}}\left(\boldsymbol{TTC},\kern0.5em \boldsymbol{TC}\right)/\sqrt{\mathit{\operatorname{cov}}\left(\boldsymbol{TTC}\right)\mathit{\operatorname{cov}}\left(\boldsymbol{TC}\right)}\right) $$
(12)

where TTC represents the true time course and TC represents the time courses computed by each method. Finally, the correlations among the GICs computed by FastICA, CICA, and the improved CICA methods with the corresponding IC of each subject were used to evaluate the group-level analysis, which can be calculated as follows:

$$ corrcoef\_{IC}_i= abs\left(\mathit{\operatorname{cov}}\left(\boldsymbol{GIC},{\boldsymbol{IC}}_{\boldsymbol{i}}\right)/\sqrt{\mathit{\operatorname{cov}}\left(\boldsymbol{GIC}\right)\mathit{\operatorname{cov}}\left({\boldsymbol{IC}}_{\boldsymbol{i}}\right)}\right) $$
(13)

where GIC represents the group independent component and IC i represents the corresponding independent component of subject i.

In the experiments in this paper, the weighting parameter “a” of the improved CICA method was a value from 0.1 to 0.9 with a step length of 0.1, and then we decided which “a” to use according to the evaluation of the experimental results of each “a.” The corresponding results were the final experimental results. Specifically, in order to guarantee the consistency of the selection of the optimal weighting parameters for the simulated data and real data experiments, the index obtained by the following formula (14) was used to select the best situation. We first calculated the average of the correlation coefficients between the GIC and the corresponding IC of each subject across all subjects and then combined this average with the correlation coefficient between the true time course and the time course calculated with a new average, which was used as the quantitative indicator to select the weighting parameter of the improved CICA method:

$$ index= corrcoef\_ TC+\sum_{i=1}^K corrcoef\_{IC}_i/K $$
(14)

where K denotes the number of subjects in the group and K = 5 for both simulated data and real data experiments in this paper.

3.1 Simulated data results

In this experiment, we focused on the task-related source 1, and its corresponding time course had a block-like shape that closely matched the experimental paradigm. The results of using weighting parameter a = 0.6 for dataset 8, a = 0.1 for dataset 14, and a = 0.9 for other datasets are presented according to formula (14).

Figure 3 shows the AUCs of the ROC curves of the GICs that were computed by FastICA, CICA, and the improved CICA methods on the 20 simulated datasets. It can be seen clearly from the figure that the AUCs of CICA were significantly higher than those of the improved CICA, except dataset 14, and those of FastICA across all datasets, and the AUCs of the improved CICA are significantly higher than those of FastICA, except dataset 8. These significant differences were verified by T test with a confidence level of 95%, which demonstrated that CICA has the best source recovery ability, and the improved CICA method with the temporal reference signal extracted from the group of subjects had better source recovery ability compared with FastICA.

Fig. 3
figure 3

The AUCs of FastICA, CICA, and the improved CICA methods on the 20 simulated datasets

Figure 4 shows the correlation coefficients (CCs) among the true time course and the group time courses computed by FastICA, CICA, and the improved CICA methods on the 20 simulated datasets. It can be seen from the figure that the CCs of the improved CICA are significantly higher than those of CICA and FastICA across all simulated datasets using T tests with a confidence level of 95%, while the CCs between CICA and FastICA were not significantly different. These results demonstrate that the improved CICA method had better temporal detection performance compared with CICA and FastICA methods.

Fig. 4
figure 4

The CCs among the true time course and the time courses computed by FastICA, CICA, and the improved CICA methods on the 20 simulated datasets

Figure 5 shows the average CCs of the GIC with the corresponding IC of each subject in the group across the 20 simulated datasets and their standard deviations, which are obtained using GICs computed by FastICA, CICA, and the improved CICA methods. We can see from the figure that the CCs calculated by CICA and the improved CICA methods were significantly higher than those of FastICA, which was verified by T tests at a confidence level of 95%. At the same time, the CCs of the improved CICA were slightly higher than those of CICA, but they were not significantly different. These results demonstrate that the GIC computed by the improved CICA method was more representative of the commonality of subjects in the group. That is, the temporal reference signal extracted from the group of subjects improved the analysis of the group data.

Fig. 5
figure 5

The average CCs of the 20 simulated datasets among the GIC and the corresponding IC of each of the five subjects and their standard deviations, which are obtained using GICs computed by FastICA, CICA, and the improved CICA methods

3.2 Real data results

In this section, only the task-related independent component is considered. The performance of the improved CICA method was compared with FastICA and CICA for fMRI data analysis at the group level. The results using weighting parameter a = 0.9 with the improved CICA method are presented according to formula (14).

Figure 6 shows the visual regions detected by FastICA, CICA, and the improved CICA methods. We can see from the figure that the regions of the improved CICA method are better than those of FastICA and similar to those of CICA, which means that the improved CICA method is superior to FastICA in source recovery using the temporal reference signal from the group of subjects and has the same performance as the CICA method.

Fig. 6
figure 6

The visual areas from slices 28 to 43 are detected by FastICA, CICA, and the improved CICA methods. All the spatial maps are z-scored with the same threshold of 2

Figure 7 shows the prior block and the time courses computed by FastICA, CICA, and the improved CICA methods as well as the CCs between the prior block and the time course of each method. It can be clearly seen from the figure that the CC of the improved CICA method is higher than those of CICA and FastICA, which means that the time course computed by the improved CICA method is more accurate than those of FastICA and CICA and further demonstrates its better temporal performance.

Fig. 7
figure 7

The prior block (black) and time courses computed by FastICA (blue), CICA (green), and the improved CICA method (red), and the CCs between the prior block and the time course of each method, including corrcoef1 for FastICA, corrcoef2 for CICA, and corrcoef3 for the improved CICA method (color figure online)

Figure 8 shows the CCs between the GIC and the corresponding IC of each subject in the group. The results were obtained using GICs computed by FastICA, CICA, and the improved CICA methods. We can see from the figure that the GICs computed by CICA and the improved CICA methods have a higher correlation with the corresponding IC of each subject than with that of FastICA. These significant differences were found using T tests at a confidence level of 95%. Although the CCs of the improved CICA were slightly lower than those of CICA, they were not significantly different. This result indicates that the GIC calculated by the improved CICA method can better reflect the commonality of subjects in the group.

Fig. 8
figure 8

The CCs between the GIC and the corresponding IC of each subject in the group. The results were obtained using GICs computed by FastICA, CICA, and the improved CICA methods

4 Discussion

In this study, the improved CICA method with the temporal a priori information extracted from the group data had better performance in detecting brain functional connectivity through the experimental results with the simulated and real fMRI data. First, the results in Figs. 3 and 6 show that the spatial source recovery of the improved CICA method was better than that of FastICA, but it was not as good as CICA. Second, the results in Figs. 4 and 7 show that the time courses of the corresponding sources computed by the improved CICA method were more accurate than those of the FastICA and CICA methods. Finally, the results in Figs. 5 and 8 demonstrate that the correlation between the GIC computed by the improved CICA method and the corresponding IC of each subject in the group was improved in comparison with that of FastICA, but there was no significant difference with CICA, which means that the GIC computed by the improved CICA method was more representative of the commonality of the subjects in the group.

In this paper, in order to imitate the situation of noise contained in the real fMRI data, noises with different SNRs that ranged from 0.3 to 0.4 were added to the simulated data, which were used for the evaluation of the performance of the different methods. However, if the SNR of the added noise was too small, all methods showed poor performance in signal detection and thus lost the significance of evaluation. On the contrary, if the SNR of the added noise was too large, all methods produced better detection results and that there was no obvious comparability between the different methods. The SNR range adopted in this paper was a trade-off between these two situations, and it was much closer to the noise found in real data.

In the classical CICA method, the a priori reference signal was incorporated using a constraint condition g(y) = ε(y, r) − ξ ≤ 0, where y denotes the output signal, r denotes a reference signal, ε(y, r) is a distance criterion, and ξ is a threshold parameter that needs to limit the distance such that the desired output signal should be the only one satisfying the inequality constraint. However, it is difficult to predetermine the threshold parameter ξ in practical application because the ICs are blind, so the choice of a suitable ξ is quite dependent on the applied CICA. Improper ξ often leads to two possible consequences. When ξ is beyond the upper bound of the feasible range, the output may produce an undesired IC. On the other hand, when ξ is smaller than the lower bound of the range, the output cannot produce any IC. Therefore, special effort has to be made to determine a proper parameter. In this paper, the multi-objective optimization strategy was applied to estimate ICs with the CICA method, which circumvents the selection of threshold parameter ξ, and the results demonstrated its improved performance.

In this paper, the weighted summation method was used to solve the multi-objective optimization problem of Eq. (6). The weight parameters “a 1” and “a 2” in formula (8) reflect the importance of the corresponding objective function f 1(w i ) and ε 1(w i ) in the summation function f(w i ). The goal of the weighted summation method is to seek a balance between the independence of the output signal and the similarity with the reference signal and then to obtain a source signal that is the closest to the reference signal with the largest independence. According to theory, applying the linear weighted summation method to solve the multi-objective optimization problem in [37] was proposed by Klamroth et al. As long as the weight parameters satisfied the conditions that they were strictly positive and added to 1, then one point of the Pareto optimal set can be found with one choice of such weights [43].

When using the linear weighted summation method to solve the multi-objective optimization problem, the summation function will contain the corresponding weight parameters that are usually determined manually according to artificial experience, and this process means that the experimental results obtained by this approach will contain many kinds of situations. Therefore, it is necessary to choose the best situation according to certain evaluation indicators from these results by additional post-processing steps. In the experiments in this paper, the weighting parameter “a” of the improved CICA method was a value from 0.1 to 0.9 with a step length of 0.1, thus making the results include nine kinds of situations. To guarantee the consistency of the selection of the optimal weighting parameters for the simulated data and real data experiments, the index obtained by formula (14) was used to select the best situation, and the evaluation results of all situations obtained by formula (14) are shown in Appendix Tables 1 and 2, which correspond to simulated-data and real-data experiments, respectively, and are presented in Appendix 2. However, sometimes the final evaluation results may be different when adopting a different evaluation index. For example, in the simulated data experiment in this paper, if we use the average of AUCs (see Fig. 3) and CCs (see Fig. 4) to choose the optimal weighting parameter, the best situation is a = 0.9 for dataset 18, which is different from the results obtained by formula (14). The evaluation results of all situations obtained by this approach are shown in Appendix Table 3, which is also presented in Appendix 2.

In addition, tensor decomposition (TD) has also shown better performance with multi-subject fMRI data analysis in recent years due to its ability to retain multi-way linkages and interactions presented in the data [44], and it can be used to obtain common spatial maps (SMs), common time courses (TCs), and subject-specific intensities [45, 46]. However, the TD method sometimes may converge to a local optimal solution because of the noise in the fMRI data, such as canonical polyadic decomposition (CPD), which is a popular TD method. To improve the robustness of CPD with respect to noise, some additional properties have been efficiently incorporated into CPD as modality constraints [47]. For example, Beckmann and Smith proposed a solution using the statistical independence as a spatial modality constraint in TD by combining ICA with CPD [45]. Recently, Kuang et al. propose a new combined ICA and CPD method by incorporating TC delays into a CP model as the temporal constraint to obtain the shared TC, and then estimated the shared SM using a least-square fit post shift-invariant CPD [48]. Therefore, how to extract a priori information from the data itself and how to introduce it into the TD method to improve fMRI data analysis will be questions worth studying in the future.

5 Conclusions

In this paper, we proposed a multi-objective optimization-based improved CICA method with temporal a priori information extracted from group subject data and then used it for group fMRI data analysis. The experimental results of simulated and real fMRI data showed that the group data analysis using the improved CICA method was better than that of FastICA in both spatial and temporal domains, and it not only increased the accuracy of spatial sources and time courses but also improved the correlation of the GIC with the corresponding IC of each subject in the group. Compared to the CICA method, it only performed better temporally, and there was a slight deficiency in spatial source signal recovery. On the whole, it has its own advantages in fMRI data analysis as a blind source separation method.