Introduction

Alzheimer’s disease (AD), characterized by the progressive decline in episodic memory, reasoning and other cognitive domains (Alzheimer's Association 2015), is considered as the most common form of dementia. It has been reported that AD incidence rate increases exponentially with aging (Ziegler-Graham et al. 2008) and 1 in 85 persons worldwide will suffer from the disease by 2050 (Brookmeyer et al. 2007). The latest statistics show that AD is the fifth leading cause of death for people over 65 years old and the sixth leading cause of death in the United States (Alzheimer's Association 2015). Recently, however, there is no effective treatment to stop the damage to neurons leading to the clinical symptoms of AD (Alzheimer's Association 2015), achieving early diagnosis at the stage of MCI is of great values. Mild cognitive impairment (MCI) is considered as the transitional state between normal senility and AD (Petersen et al. 2001; Gauthier et al. 2006; Li et al. 2019b). Approximately 10% to 15% of MCI patients deteriorate to AD every year, and more than half of them develop AD within 5 years (Petersen et al. 2001; Gauthier et al. 2006). Due to this high conversion rate, it is thus crucial to accurately identify MCI so that appropriate actions can be taken to slow down the progress of the disease. However, MCI is difficult to diagnose because of its relatively subtle symptoms (Eshkoor et al. 2015). In this case, many researchers dedicated to the diagnosis of AD and MCI with the aid of neuroimaging techniques (Davatzikos et al. 2011; Suk et al. 2014; Huang et al. 2010; Liu et al. 2014).

Neuroimaging techniques, such as Magnetic Resonance Imaging (MRI) (Fan et al. 2008; Hu et al. 2016; Cuingnet et al. 2011) and Magnetoencephalography (MEG) (Amezquita-Sanchez et al. 2016), are considered to be a powerful tool for classification of neurodegenerative diseases, especially for MCI and AD (Josef Golubic et al. 2017; Sandanalakshmi and Sardius 2016). On the other hand, resting-state functional Magnetic Resonance Imaging (rs-fMRI), which employs the blood-oxygenation-level-dependent (BOLD) signal as neurophysiological index, has been used recently for the early diagnosis of MCI before the appearance of clinical symptoms (McKenna et al. 2016; Wee et al. 2016; Khazaee et al. 2017). The correlation between BOLD signals of two brain regions is regarded as the functional connectivity between that region pair (Greicius 2008). The functional connectivity or temporal correlations among all regions within the brain (Van Den Heuvel and Pol 2010) are often characterized through functional connectivity networks using the graph theory (Fornito et al. 2010; Lee et al. 2017; Li et al. 2013). The differences between normal and damaged functional connectivity networks caused by pathological attacks can be considered as the biomarkers to study the pathological underpinnings of MCI (Jie et al. 2014; Li et al. 2014; Qi et al. 2010; Chand et al. 2017; Wee et al. 2014).

The most classical functional connectivity network modeling approach is based on pairwise Pearson’s correlation coefficient (Wee et al. 2012a; Power et al. 2011). The correlation-based methods are easy to understand and have less computational complexity. However, in fact, brain regions are connected with only a limited number of regions, not all brain regions (Liao et al. 2017). Thus, the correlation-based networks are inconformity with the sparse nature of actual brain networks (Lee et al. 2011; Li et al. 2014). To overcome this limitation, some sparse modeling approaches have been proposed to construct sparse functional connectivity network based on whole brain connectivity information, i.e., identifying a small number of connections from the whole brain dense connections. For example, Lee et al. (2011) constructed sparse brain networks based on a l1-norm regularized linear regression model, and the difference between the modular structures was derived from the sparse brain networks of autism spectrum disorders and pediatric control subjects, respectively. Rosa et al. (2015) proposed a sparse functional connectivity network modeling method based on l1-norm regularized maximum likelihood estimation and Gaussian graphical models, and discovered the discriminative changes in brain networks between major depressive disorders and normal controls (NCs). These sparse brain network modeling methods are considered more sensitive to reject the spurious connections than the correlation-based methods (Lee et al. 2011; Rosa et al. 2015). In addition, to minimize the influence of inter-subject variability, some recent approaches have also adopted l2, 1-norm penalization to ensure the consistency of non-zero connections across subjects (Li et al. 2018d, c; Wee et al. 2014). For example, a sparse causality model aid by l2, 1-norm penalization was proposed to detect causal interactions in multivariate time series (Haufe et al. 2008). The group constrained causality model yields a better performance than some classical methods including Granger Causality, Ridge Regression, and Lasso in the causal structure detection. Ryali et al. (2012) adopted a sparse partial correlation method which combines l1- and l2-norm penalization to estimate the functional connectivity between brain regions in fMRI data, where l1-norm penalty provides sparse interpretable solutions and l2-norm penalty improves the sensitivity of the model. Thus, this method can provide a more accurate brain connection estimation result. Furthermore, in order to explore study the directional causal interactions among brain regions, a cross-spectral density connectivity estimation method was proposed to estimate the effective connection from the whole brain fMRI data (Lennartz et al. 2018). With a reduced dependence on hemodynamic variability, this method may produce more reliable connectivity estimation results. In addition, a group constrained effective connectivity inference method was further proposed for MCI identification (Li et al. 2018d). This approach combines the l2, 1-norm penalization with the time-dependent effective connectivity estimation, and thus can generate effective brain connectivity networks with the consistent topology among subjects. In all the aforementioned modeling methods, the vertices of brain networks correspond to the brain regions and the edges correspond to the correlation among brain regions, thus producing the so-called low-order network (Chen et al. 2016). It is notable that the low-order correlation network is normally calculated over the whole time series, without considering potential temporal variations among the correlations (Chen et al. 2016). However, the brain activities are indeed not static across the entire scanning period and the correlations among brain regions vary across time (Allen et al. 2014; Hutchison et al. 2013; Liu et al. 2017). Therefore, the conventional methods, which ignore temporal variations among correlations, may fail to diagnose MCI accurately (Chen et al. 2017). Recently, a high-order network modeling method has been proposed to preserve the dynamic correlation information neglected in the conventional methods (Chen et al. 2016), where the vertices of high-order network correspond to the brain region pairs, and the edges correspond to the correlation between the brain region pairs. The high-order network modeling approach takes the temporal variations among correlations into account, and discovers discriminative dynamic correlation information for MCI classification. However, the existing high-order network modeling approach (Chen et al. 2016) is derived using the pairwise Pearson’s correlation-based low-order functional connectivity, which is inconsistent with the sparse nature and small-world characteristics of most biological networks (Supekar et al. 2008).

To overcome this deficiency, we propose a novel high-order network modeling method that utilizes a specially designed unified sparse regression framework. Specifically, we construct the high-order functional connectivity networks by using a novel ULS Group constrained topology structure detection algorithm which is accompanied with an Ultra-Orthogonal Least Squares (UOLS) algorithm. The former, which consists of an ultra-least squares (ULS) criterion and a Group constrained topology structure detection algorithm, is applied to detect the topology of functional connectivity networks. The latter, which consists of the ULS criterion and an Orthogonal Least Squares (OLS) algorithm (Li et al. 2019a), is employed to estimate the strength of functional connectivity. The rationale of using the ULS criterion in our proposed method is that, besides extracting the classical dependent relation between the fMRI time series of a region pair, it further extracts the dependent relation of the associated weak derivatives, and thus avoids the overfitting problem which is common in the conventional least squares criterion (Guo et al. 2016). The weak derivative, which can be calculated for all integrable functions, is a measure describing interconnections among the data points, where the definition of the weak derivative can be found at Appendix. In other words, the classical least squares (LS) criterion based method does not take into consideration the continuity between fMRI time series. The absence of the connection information between data points may lead to an inaccurate model structure. To overcome this limitation, we integrate the ULS criterion, which can describe the relationship among data points, into our proposed framework. Additionally, different from the traditional sparse regression algorithm with a l1-norm penalization which leads to different network structures at individual level (Lee et al. 2011), the Group constrained topology structure detection algorithm in our modeling method utilizes a l2, 1-norm penalization to encourage an identical network topology among subjects. Identical network topology ensures an easier comparison between subjects, thus achieving a better generalization performance in brain disease classification (Wee et al. 2014; Zhu et al. 2014).

The high-order network is able to encode the temporal variations of correlation between brain regions, while it is unable to characterize the holistic correlation calculated based on the whole time series as in the low-order correlation network. Therefore, in order to incorporate both the low-order correlation and the temporally dynamic information encoded in the high-order correlation for better classification performance, we first construct a decision tree (DCT) for each type of correlations and then fused their classification scores together to provide the final classification decision. The fused DCT model takes into account not only the correlation derived based on the whole time series, but also the temporal variations between correlations. We have compared our proposed framework (i.e., fusion of high-order and low-order functional connectivity networks) with the state-of-the-art methods on the same dataset, and the experimental results demonstrate the superiority of the proposed framework for MCI classification.

In summary, the main contributions of our proposed framework are three-fold:

  • Taking into consideration the discrepancy between the weak derivatives of the observed signals and the model prediction function during functional connectivity network estimation;

  • Derive the functional networks using sparse regression framework to preserve the sparse nature of brain networks while enforcing identical network topology among all subjects to ease the between-subject comparison;

  • Simultaneously considering the dynamic correlation information and the holistic correlation information for MCI classification by fusing the high-order and low-order networks.

The rest of the paper is organized as follows. “Materials and Methods” section furnishes information on the data acquisition and post-processing, followed by the proposed framework for the construction and fusion of low- and high-order functional connectivity networks for MCI classification. Then, we evaluate and discuss the performance of the proposed framework in “Results and Discussions” section. Finally, we conclude this paper in “Conclusion” section.

Materials and Methods

Proposed Framework

In Fig. 1, we provide the proposed MCI classification framework, based on the fusion of high-order and low-order sparse functional connectivity networks. “Data Acquisition and Preprocessing” section provides the details of data acquisition and preprocessing; In “Low-Order Functional Connectivity Networks” section and “High-Order Functional Connectivity Networks” section, we construct the low- and high-order functional connectivity networks, respectively. “Feature Extraction, Selection, and Classification” section gives the process of the feature extraction, selection, and fusion classification. Finally, we summarize the methodology in “Summary of Methodology” section.

Fig. 1
figure 1

The schematic diagram of the proposed framework

Data Acquisition and Preprocessing

This study is approved by the local ethical committee and all of participants voluntarily participated in this study with informed consents. The participants are recruited via advertisements in local newspapers and media. All the participants carry on regular neuropsychological assessment, including the mini-mental state examination (MMSE) (Van Patten et al. 2018), the hospital anxiety and depression Scale (HAD) (McKenzie et al. 2018), and the lawton’s instrumental activities of daily living (IADL) (Mao et al. 2018). All the participants are also evaluated using the Clinical Dementia Rating scale (CDR) (Das et al. 2018). MCI patients are diagnosed according to the criteria proposed by Petersen (2004), which are: (1) memory loss complaint corroborated by an informant; (2) objective cognitive impairment in single or multiple domains, adjusted for age and education; (3) preserved general cognitive function; (4) failure to meet the criteria for dementia; (5) the Clinical Dementia Rating (CDR) score is 0.5. The inclusion criteria of NCs were: (1) no complaint of memory loss; (2) CDR score is 0; (3) no severe visual or auditory impairment.

Twenty-eight MCI patients and thirty-three demographically matched NCs are selected from the participants. All subjects are scanned using a standard echo-planar imaging (EPI) sequence on a 3 Tesla Siemens TRIO scanner with the following parameters: TR = 3000 ms, TE = 30 ms, acquisition matrix = 74 × 74, 45 slices, and voxel thickness = 3 mm. One-hundred and thirty resting-state fMRI volumes are acquired. Standard preprocessing pipeline of the fMRI images is performed using Statistical Parametric Mapping 8 (SPM8) software package, which includes removal of first 10 fMRI volumes, slice timing correction, head-motion correction, regression of nuisance signals (ventricle, white matter, global signal, and head-motion with Friston’s 24-parameter model (Wee et al. 2016)), signal de-trending. Next, the brain space was parcellated into 90 region-of-interests (ROIs) based on the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002). Finally, we adopt a band-pass filter (0.01–0.08 Hz) to eliminate the effects of low- and high-frequency noise.

Low-Order Functional Connectivity Networks

ULS Group Constrained Topology Structure Detection

Suppose that there are N subjects (N = 61), each subject includes the total of M ROIs (M = 90), \( {\boldsymbol{y}}_m^n \) denotes the ROI time series of the m-th ROI from the n-th subject. Compared to the widely-used Lasso algorithm (Lee et al. 2011) with a l1-norm penalization, which may generate different network topologies for different subjects, the Group constrained topology structure detection algorithm with a l2, 1-norm penalization minimize this inter-subject variability by encouraging an identical network topology across subjects (Wee et al. 2014). The objective function of the Group constrained topology structure detection algorithm is given by

$$ f\left({\boldsymbol{\varTheta}}_m\right)={\sum}_{n=1}^N{\left\Vert {\boldsymbol{y}}_m^n-{\boldsymbol{A}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2+\lambda {\left\Vert {\boldsymbol{\varTheta}}_m\right\Vert}_{2,1} $$
(1)

where \( {\boldsymbol{A}}_m^n=\left[{\boldsymbol{y}}_1^n,\dots, {\boldsymbol{y}}_{m-1}^n,{\boldsymbol{y}}_{m+1}^n,\dots, {\boldsymbol{y}}_M^n\right] \) is a matrix which includes all ROIs time series except the m-th ROI, λ > 0 is the regularization parameter that controls the sparsity level of the regression model, \( {\boldsymbol{\theta}}_m^n={\left[{\theta}_1^n,\dots, {\theta}_{m-1}^n,{\theta}_{m+1}^n,\dots, {\theta}_M^n\right]}^T \) is the weight vector that describes the relationship between the m-th ROI and the other eighty-nine ROIs for n-th subject, and \( {\boldsymbol{\varTheta}}_m=\left[{\boldsymbol{\theta}}_m^1,{\boldsymbol{\theta}}_m^2,\dots, {\boldsymbol{\theta}}_m^N\right] \) is the weight matrix for all subject. It should be noted that each row of the matrix Θm represents the coefficient vector of one ROI for all subjects, while each column of the matrix Θm represents the coefficient vector of all ROIs for one subject (i.e. \( {\boldsymbol{\theta}}_m^n \)). ‖Θm2, 1 is the summation of the l2-norm of each row in the matrix Θm, not the summation of the l2-norm of each column in Θm (i.e. \( {\sum}_{n=1}^N{\left\Vert {\boldsymbol{\theta}}_m^n\right\Vert}_2 \)). Therefore, Eq. (1) encourages consistent non-zero elements for the given ROI across all subjects and cannot be rewritten as \( f\left({\boldsymbol{\theta}}_m^n\right)={\left\Vert {\boldsymbol{y}}_m^n-{\boldsymbol{A}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2+\lambda {\left\Vert {\boldsymbol{\theta}}_m^n\right\Vert}_2\ \left(n=1,2,\dots, N\right) \). Constructing the sparse network structure can be considered as an optimization problem, i.e., minimizing the above objective function.

In order to obtain a more accurate evaluation standard for the model fitness, an ULS criterion is integrated into the Group constrained topology structure detection algorithm to generate an ULS Group constrained topology structure detection algorithm. The ULS criterion, which considers the discrepancy between the weak derivatives of the observed signals and the model prediction function and thus avoids the overfitting problem, is defined by

$$ {J}_{ULS}={\left\Vert {\overset{\sim }{\boldsymbol{y}}}_m^n-{\overset{\sim }{\boldsymbol{A}}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2, $$
(2)

where \( {\overset{\sim }{\boldsymbol{y}}}_m^n={\left[{\left({\boldsymbol{y}}_m^n\right)}^T,{\left({D}^1{\boldsymbol{y}}_m^n\right)}^T,{\left({D}^2{\boldsymbol{y}}_m^n\right)}^T,\dots, {\left({D}^L{\boldsymbol{y}}_m^n\right)}^T\right]}^T \), generated by connecting the original ROI time series \( {\boldsymbol{y}}_m^n \) with its weak derivatives \( {D}^l{\boldsymbol{y}}_m^n\left(l=1,2,\dots, L\right) \), is the ultra-ROI time series of the m-th ROI from the n-th subject, and \( {\overset{\sim }{\boldsymbol{A}}}_m^n=\left[{\overset{\sim }{\boldsymbol{y}}}_1^n,\dots, {\overset{\sim }{\boldsymbol{y}}}_{m-1}^n,{\overset{\sim }{\boldsymbol{y}}}_{m+1}^n,\dots, {\overset{\sim }{\boldsymbol{y}}}_M^n\right] \) is a matrix consisting of all ultra-ROI time series except the m-th self ROI (the details of the ULS criterion and weak derivatives are provided in Appendix). Specifically, as discussed in Appendix, the ULS criterion can be incorporated into the Group constrained topology structure detection algorithm by replacing the original ROI time series \( {\boldsymbol{y}}_m^n \) with the ultra-ROI time series \( {\overset{\sim }{\boldsymbol{y}}}_m^n \) in Eq. (1). Therefore, the ULS Group constrained topology structure detection algorithm is adopted to detect a more accurate network topology via the following objective function:

$$ f\left({\boldsymbol{\varTheta}}_m\right)={\sum}_{n=1}^N{\left\Vert {\overset{\sim }{\boldsymbol{y}}}_m^n-{\overset{\sim }{\boldsymbol{A}}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2+\lambda {\left\Vert {\boldsymbol{\varTheta}}_m\right\Vert}_{2,1}. $$
(3)

By separating the weak derivative part \( {D}^l{\boldsymbol{y}}_m^n\left(l=1,2,\dots, L\right) \) from the original ROI time series part \( {\boldsymbol{y}}_m^n \), Eq. (3) can be rewritten as

$$ f\left({\boldsymbol{\varTheta}}_{\boldsymbol{m}}\right)={\sum \limits}_{n=1}^N{\left\Vert {\boldsymbol{y}}_m^n-{\boldsymbol{A}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2+{\sum \limits}_{n=1}^N{\sum \limits}_{l=1}^L{\left\Vert {D}^l{\boldsymbol{y}}_m^n-{D}^l{\boldsymbol{A}}_m^n{\boldsymbol{\theta}}_m^n\right\Vert}_2^2+\lambda {\left\Vert {\boldsymbol{\varTheta}}_{\boldsymbol{m}}\right\Vert}_{2,1} $$
(4)

It should be noted that the estimated coefficients matrix Θm based on Eq. (4) cannot be regarded as the functional connectivity strengths, because they are biased as the result of group-constrained sparse penalization (Li et al. 2018d). Particularly, some of the coefficients are even negative, leading to difficulty in interpreting and analyzing the functional connectivity network. Therefore, the estimated coefficients matrix Θm based on Eq. (4) are only treated as the network topology indicator. The ROIs with non-zero elements in the coefficient matrix Θm are considered to have functional connections with the target m-th ROI, while the zero element in Θm indicates the non-connection between the corresponding ROI with the target m-th ROI.

Strength Estimation of Sparse Functional Connectivity Networks Via UOLS

Supposing that P ROIs have been found correlated to the target m-th ROI based on the ULS Group constrained topology structure detection algorithm in “ULS Group Constrained Topology Structure Detection” section, the associated the ultra-ROI time series of these ROIs \( {\overset{\sim }{\boldsymbol{y}}}_{m_p}^n={\left[{\left({\boldsymbol{y}}_{m_p}^n\right)}^T,{\left({D}^1{\boldsymbol{y}}_{m_p}^n\right)}^T,{\left({D}^2{\boldsymbol{y}}_{m_p}^n\right)}^T,\dots, {\left({D}^L{\boldsymbol{y}}_{m_p}^n\right)}^T\right]}^T\ \left(p=1,2,\dots, P\right) \) are selected as the candidate time series to be used for the functional connectivity strength estimation, while the ultra-ROI time series of other ROIs are discarded. Then, we utilize an UOLS algorithm to estimate the functional connectivity strengths between these P ROIs with the target m-th ROI (Li et al. 2018d). The UOLS algorithm is the combination of the ULS criterion (Eq. (2)) and the OLS algorithm (Guo et al. 2016). According to Appendix, the ULS criterion can be integrated into the OLS algorithm by incorporating the weak derivatives into the original ROI time series. Therefore, we can obtain the UOLS algorithm by replacing the original ROI time series with the ultra-ROI time series (containing the original ROI time series and its weak derivatives) in the OLS algorithm. The detailed procedure of the UOLS algorithm can be founded in Table 1. In the UOLS algorithm, the functional connectivity strength is estimated in a stepwise orthogonal forward procedure. The value of \( MaxUerr\left({\overset{\sim }{\boldsymbol{y}}}_m^n,{\overset{\sim }{\boldsymbol{y}}}_{m_p}^n\right) \) is regarded as the functional connectivity strength between the target m-th ROI and candidate p-th ROI.

Table 1 Strength estimation of functional connectivity via UOLS algorithm

In summary, we first repeat the “ULS Group Constrained Topology Structure Detection” section M times to detect the network topology for all subjects (each time a different ROI for all subjects will be selected to be the target ROI in Eq. (4)). Then, the network connectivity strengths for all subjects are estimated by repeating the “Strength Estimation of Sparse Functional Connectivity Networks via UOLS” section M × N times. Each time a different ROI for one subject will be regarded as the target ROI. In this way, we can obtain a low-order ULS Group constrained-UOLS network for each subject.

High-Order Functional Connectivity Networks

Ultra-ROI Time Series Segment

The first step in the high-order network construction is to employ a sliding window to partition each ROI time series into multiple overlapping segments. For the one ROI time series containing Z temporal image volumes, \( {\boldsymbol{y}}_m^n \), the total number of time series segments by using the sliding window can be computed as K = [(Z − S)/r] + 1, where S is the size of the sliding window and r denotes the step size between adjacent windows. Letting \( {\boldsymbol{y}}_m^n(k) \) be the k-th segment generated from \( {\boldsymbol{y}}_m^n \), for the n-th subject, the k-th segments of all M ROIs can be represented in a matrix form as \( {\boldsymbol{Y}}^n(k)=\left[{\boldsymbol{y}}_1^n(k),{\boldsymbol{y}}_2^n(k),\dots, {\boldsymbol{y}}_M^n(k)\right]\epsilon {R}^{S\times M} \). Further, the set of the k-th segments for all subjects and all ROIs can be represented by Y(k)ϵRS × M × N. Thus, by applying a sliding window to all fMRI time series, we can obtain K sets of time series segments Y(k). Then, we generate the ultra-ROI time series segment \( \overset{\sim }{\boldsymbol{Y}}(k)={\left[{\left(\boldsymbol{Y}(k)\right)}^T,{\left({D}^1\boldsymbol{Y}(k)\right)}^T,{\left({D}^2\boldsymbol{Y}(k)\right)}^T,\dots, {\left({D}^L\boldsymbol{Y}(k)\right)}^T\right]}^T \) by connecting the original time series segment Y(k) with its weak derivatives DlY(k)(l = 1, 2,  … , L), where ‘T’ in the (Y(k))T denotes the transpose of interchanging the first and second dimensions for the three-dimensional tensor. The details of the weak derivative and the ULS criterion are provided in Appendix.

Construction of Temporal Low-Order Functional Connectivity Networks

For each ultra-ROI time series segment \( \overset{\sim }{\boldsymbol{Y}}(k) \), we apply the ULS Group constrained topology structure detection algorithm across subjects to detect the functional connectivity network structure and then apply an UOLS at the individual level to re-estimate the connectivity strength of each derived connection, producing a total of K × N temporal low-order functional connectivity networks. Taking each ROI with \( \left\{{\boldsymbol{y}}_m^n(k)\right\} \) as the vertex and \( \left\{{C}_{m_1,{m}_2}^n(k)\right\}\left(1\le {m}_1\le M,1\le {m}_2\le M,{m}_1\ne {m}_2\right) \) as the connectivity strength for each pair of vertices, the temporal low-order functional connectivity network can be expressed as \( {\boldsymbol{G}}^n(k)=\left(\left\{{\boldsymbol{y}}_m^n(k)\right\},{C}_{m_1,{m}_2}^n(k)\right)\ \left(k=1,2,\dots, K\right) \), where \( \left\{{C}_{m_1,{m}_2}^n(k)\right\} \) is the correlation between the m1-th ROI and m2-th ROI of the n-th subject in the k-th window. A larger value of \( \left\{{C}_{m_1,{m}_2}^n(k)\right\} \) indicates a stronger connection between the m1-th ROI and m2-th ROI in the k-th window.

The next step is to extract the correlation time series from the temporal low-order functional connectivity networks. For the n-th subject, the correlation time series of each ROI-pair (m1, m2), \( {\boldsymbol{C}}_{m_1,{m}_2}^n={\left[{C}_{m_1,{m}_2}^n(1),{C}_{m_1,{m}_2}^n(2),\dots, {C}_{m_1,{m}_2}^n(K)\right]}^T\upepsilon {R}^K \), can be obtained by concatenating all \( {C}_{m_1,{m}_2}^n(k)\ \left(k=1,2,\dots, K\right) \). Different from \( {\boldsymbol{y}}_m^n \) that represents the time series of a ROI, \( {\boldsymbol{C}}_{m_1,{m}_2}^n \) characterizes the variations of the correlation of ROI-pair (m1, m2) across time (Chen et al. 2016). Considering the low-order network is asymmetric, the total number of correlation time series \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n|1\le {m}_1\le M,1\le {m}_2\le M,{m}_1\ne {m}_2\right\} \) for each subject is M(M − 1).

Construction of High-Order Functional Connectivity Networks

The final step is to construct high-order functional connectivity networks based on the correlation time series. For the set of all correlation time series \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n|1\le {m}_1\le M,1\le {m}_2\le M,{m}_1\ne {m}_2,1\le n\le N\right\} \), we employ the ULS Group constrained topology structure detection algorithm to detect the high-order network structure and then employ the UOLS to estimate the high-order connectivity strength, establishing a total of N high-order networks with identical network structure. Specifically, for each pair of correlation time series \( \Big(\left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\},\left\{{\boldsymbol{C}}_{m_3,{m}_4}^n\right\} \)), \( \left\{{E}_{\left({m}_1,{m}_2\right),\left({m}_3,{m}_4\right)}^n\right\} \) is used to represent the high-order correlation between them. In other words, \( \left\{{E}_{\left({m}_1,{m}_2\right),\left({m}_3,{m}_4\right)}^n\right\} \) is the high-order connectivity strength between the ROI-pairs (m1, m2) and (m3, m4). By considering ROI-pair (m1, m2) with \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\} \) as vertex and \( \left\{{E}_{\left({m}_1,{m}_2\right),\left({m}_3,{m}_4\right)}^n\right\} \) as the weights of edges, the high-order network can be expressed as \( {\boldsymbol{G}}^n=\left(\left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\},\left\{{E}_{\left({m}_1,{m}_2\right),\left({m}_3,{m}_4\right)}^n\right\}\right) \). Therefore, a high-order network is devoted to describe the relationship of temporal correlations among brain regions.

However, there is an obvious limitation about the high-order network, i.e., the scale of the high-order network is too large. As mentioned above, the number of vertices \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\} \) is M(M − 1), thus the number of edges is proportional to M4. This large scale of a high-order network leads to a large amount of computation complexity and the poor generalization performance. To overcome this limitation, the Ward’s hierarchical grouping (Chen et al. 2014) is applied to group the correlation time series into different clusters. Specifically, when grouping the correlation time series for all subjects into different clusters, we need to ensure the consistency of the clustering results between different subjects. Thus, the correlation time series \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\} \) for all subjects (n = 1, 2,  … , N) are first connected together \( {\boldsymbol{C}}_{m_1,{m}_2}={\left[{\left({\boldsymbol{C}}_{m_1,{m}_2}^1\right)}^T,{\left({\boldsymbol{C}}_{m_1,{m}_2}^2\right)}^T,\dots, {\left({\boldsymbol{C}}_{m_1,{m}_2}^N\right)}^T\right]}^T\upepsilon {R}^{KN\times 1} \). Then, we apply the Ward’s hierarchical grouping (Chen et al. 2014) to group \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}\right\} \) into different U clusters and obtain the grouping results {Ω1, Ω2,  … , ΩU}, where ROI-pair is (m1, m2) ∈ Ωu if \( {\boldsymbol{C}}_{m_1,{m}_2} \) belongs to the u-th cluster. The correlation time series that belongs to the same cluster have the similar time variation. Then, the mean correlation time series of the u-th cluster for the n-th subject can be calculated as follows:

$$ {\overline{\boldsymbol{C}}}_u^n=\frac{\sum_{\left({m}_1,{m}_2\right)\in {\Omega}_u}{\boldsymbol{C}}_{m_1,{m}_2}^n}{\left|{\Omega}_u\right|}, $$
(5)

where |Ωu| is the number of elements in Ωu. Finally, the mean correlation time series of each cluster \( \left\{{\overline{\boldsymbol{C}}}_u^n\right\} \) is used as the vertices of the high-order network, instead of the correlation time series \( \left\{{\boldsymbol{C}}_{m_1,{m}_2}^n\right\} \). As a result, the scale of the high-order network can be significantly reduced and the generalization capability of the high-order network can be significantly improved (Chen et al. 2016).

Feature Extraction, Selection, and Classification

Feature Extraction

The ULS Group constrained-UOLS sparse regression is an asymmetric network model. Thus, the low-order and high-order ULS Group constrained-UOLS networks are asymmetric networks. In this work, we extract four typical asymmetric network features from the low-order and high-order ULS Group constrained-UOLS networks, respectively, including the weighted clustering coefficient, betweenness centrality, in-degree, and out-degree (Rubinov and Sporns 2010; NeuroimageFagiolo 2007). Specifically, the weighted-clustering coefficient can quantify the probability that the neighbors of one ROI are connected to each other (Chen et al. 2016), and reflects the prevalence of clustered connectivity around this ROI (Rubinov and Sporns 2010). Betweenness centrality reflects the idea that the central nodes, which take part in many shortest paths from one ROI to another, consequently act as the important controls of information flow in the brain network (Rubinov and Sporns 2010). In-degree and out-degree are the most fundamental measures of the asymmetric network and most other measures are based on the degree of nodes (Bullmore and Sporns 2009). Therefore, we finally extract these four features from the low-order and high-order networks, which are widely used to describe the characteristics of brain networks (Chen et al. 2018, 2016; Khazaee et al. 2015, 2016). For a node i of an asymmetric network, the weighted clustering coefficient is defined by

$$ {WC}_i=\frac{\sum_{j\ne i}{\sum}_{h\ne \left(i,j\right)}\ \left({c}_{i,j}^{1/3}+{c}_{j,i}^{1/3}\right)\left({c}_{i,h}^{1/3}+{c}_{h,i}^{1/3}\right)\left({c}_{j,h}^{1/3}+{c}_{h,j}^{1/3}\right)}{2\left[{d}_i\left({d}_i-1\right)-2{d}_i^{\leftrightarrow}\right]}, $$
(6)

where ci, j is the connectivity strength from a node i to j, di represents the number of adjacent points of a node i, and \( {d}_i^{\leftrightarrow } \) denotes the number of bilateral edges between i and its adjacent nodes. For a node i, the betweenness centrality is defined by

$$ {BC}_i={\sum}_{j\ne i,k\ne \left(i,j\right)}\frac{\rho_{jk}(i)}{\rho_{jk}}, $$
(7)

where ρjk is the number the of shortest paths from k to j, and ρjk(i) denotes the number of the shortest paths from k and j that pass through i. The in-degree and out-degree of a node i are defined as the number of connections from its adjacent nodes to i, and the number of connections from i to its adjacent nodes, respectively.

In this section, four types of features were extracted from each ROI (in low-order network) or cluster (in high-order network). Hence, we can obtain 4 × M low-order features and 4 × U high-order features from each subject (M = 90 and the parameter optimization of U is provided in “Impact of the Number of Clusters U” section). Then, we arrange these features into a low-order feature vector (with 4 × M features) and a high-order feature vector (with 4 × U features), respectively.

Feature Selection

It is inevitable that the features extracted from functional connectivity networks contain some irrelevant or redundant features for MCI classification. Those irrelevant or redundant features increase the running time of the learning algorithm and deteriorate the generalization performance of the classifier (Jain and Singh 2018). Thus, we input the low-order and high-order feature vectors obtained in “Feature Extraction” section into a feature selection filter to eliminate these irrelevant and redundant features. Specifically, for all features, the correlation coefficients between the features and the class labels of training samples are first calculated. The feature with a larger correlation coefficient is considered as a more discriminative feature for MCI classification. The feature with its coefficient larger than a predefined threshold ε1 will be selected to form a feature set γ1. Then, we apply the Relief algorithm (Zhang et al. 2016) to compute the weights for the features that do not belong to γ1. The features with their weights larger than a predefined threshold ε2 will be selected to form a feature set γ2. Next, by treating the feature values as the corresponding classification scores for the training samples, we can compute the area under curve (AUC) for the features that do not belong to γ1 and γ2. The features whose AUC are larger than a predefined threshold ε3 will be selected (denoted as γ3). Finally, γ = γ1 ∪ γ2 ∪ γ3 will be the final feature vector for MCI classification.

Classification

Due to the limited sample size, in this work, a bagging-based nested 10-fold cross-validation scheme is adopted to determine the optimal parameter and evaluate the classification performance. The nested 10-fold cross-validation scheme contains two cross-validation loops: the inner cross-validation loop based on the training set (containing 9 fold subjects) is used to determine the optimal parameters including feature selection parameters ε1 ([0.1, 0.15,⋯, 0.5]), ε2 ([0.01, 0.02, ⋯, 0.1]) and ε3 ([0.1, 0.15,⋯, 0.5]), and the outer cross-validation loop based on the test set (containing 1 fold subjects) is applied to evaluate the generalization performance of DCT model.

The bagging approach generates a strong classifier by combining a series of weak classifiers which are constructed based on the bootstrapped samples from the training dataset (Shah et al. 2015; Breiman 1996). It should be noted that the weak classifiers need to be relatively unstable, such that the classification results of them can change considerably even with a slightly different training dataset. Therefore, we adopt a typically unstable classifier (i.e. DCT) to construct the weak classifiers. Then, we combine the weak classifiers via the bagging approach to generate a strong classifier.

For each fold of the outer cross-validation loop, two DCT sub-models are constructed based on the low-order feature vector and the high-order feature vector selected in “Feature Selection” section, respectively. For the test data set, each DCT sub-model can provide a classification score for each subject, and we get the final classification score for each subject by combining the classification scores from two DCT sub-models with a certain fusion weight α.

To characterize performance comparison between different methods, we use four frequently used performance measures including accuracy (ACC), sensitivity (SEN), specificity (SPC), and AUC (Li et al. 2018b; Wang et al. 2017). The accuracy, sensitivity, and specificity are defined as follows (Li et al. 2017):

$$ \mathrm{ACC}=\frac{TP+ TN}{TP+ FN+ TN+ FP} $$
(8)
$$ \mathrm{SEN}=\frac{TP}{TP+ FN} $$
(9)
$$ \mathrm{SPC}=\frac{TN}{TN+ FP} $$
(10)

where TP, FP, TN, and FN denote the true positive, false positive, true negative, and false negative, respectively. In order to obtain reliable evaluation results, we repeat the 10-fold cross-validation for 20 times and report our classification results as the average of these 20 repetitions.

Summary of Methodology

The proposed fusion of high- and low-order functional connectivity networks framework is summarized and listed as follows:

  1. (1)

    Data Acquisition and Preprocessing:

  2. 1)

    Scan the fMRI images of twenty-eight MCI patients and thirty-three NCs using a standard echo-planar imaging (EPI) sequence on a 3 Tesla Siemens TRIO scanner;

  3. 2)

    Preprocess the fMRI images is performed using the SPM8 software package, including removal of first 10 fMRI volumes, slice timing correction, head-motion correction, regression of nuisance signals, signal de-trending, and band-pass filtering.

  4. (2)

    The construction of the low-order networks:

  5. 1)

    Connect the ROIs time series with its weak derivatives to generate the ultra-ROI time series;

  6. 2)

    Detect the topology of low-order networks via the ULS Group constrained topology structure detection algorithm;

  7. 3)

    Apply an UOLS algorithm to estimate the functional connectivity strength of the low-order networks.

  8. (3)

    The construction of the high-order networks:

  9. 1)

    Apply a sliding window to partition the original ROIs time series into multiple overlapping segments of subsequence;

  10. 2)

    Generate the ultra-ROI time series segment by connecting the each ROIs time series segment with its weak derivatives;

  11. 3)

    Similar to the construction of low-order networks, construct the temporal low-order networks based on the ultra-ROI time series segments via the ULS Group constrained topology structure detection algorithm and UOLS algorithm;

  12. 4)

    Stack all temporal low-order networks of all subjects together (Leonardi et al. 2013) to generate a set of correlation time series, with each correlation time series corresponding to the strength of the edge in the same coordinate of stacked networks;

  13. 5)

    Apply Ward’s hierarchical grouping (Chen et al. 2014) to group the correlation time series into different clusters. Calculate the mean correlation time series in each cluster, and take these mean correlation time series as the new vertices of the high-order network;

  14. 6)

    Different from the construction of low-order networks based on the ROI time series, the high-order networks are constructed based on the mean correlation time series via the ULS Group constrained topology structure detection algorithm and UOLS algorithm, where, for each subject, we can obtain a high-order functional connectivity network by this step.

  15. (4)

    Feature Extraction, Selection, and Classification:

  16. 1)

    Extract low- and high-order features from low- and high-order networks, respectively. Select the most discriminative features from the low- and high-order features, respectively;

  17. 2)

    Construct two DCT sub-models (Akhoondzadeh 2016) based on the selected low- and high-order features, respectively, and obtain the final classification scores for each subject by combining the classification scores from the two DCT sub-models with a certain fusion weight α.

Results and Discussions

The Impact of Parameters Optimization

In this section, we investigate the influence of different parameters on the classification performance, including the highest order of the weak derivatives L, the window size S, the step size r, the sliding window function, the number of clusters U, and the fusion weight α, respectively.

Effect of the Highest Order of the Weak Derivatives L

In order to extract the information of relations among fMRI data points, we incorporate the weak derivatives \( {D}^l{\boldsymbol{y}}_m^n\left(l=1,2,\dots, L\right) \) into the original ROI time series \( {\boldsymbol{y}}_m^n \), where L is the highest order of the weak derivatives and determines the number of weak derivatives used in the low- and high-order network construction. We seek to explore how the classification performance of the low-order, high-order, and fusion frameworks are influenced by the changes of L. Specifically, we varied L from 0 to 4 with a step of 1, and reported the classification accuracy and AUC values of the low-order, high-order and fused methods in Fig. 2. When L = 0, these classification frameworks are constructed only based on the original ROI time series without the weak derivatives. It can be noticed that the classification performance at L = 0 is inferior to that at 0 < L ≤ 4, confirming the importance of incorporating the weak derivatives into the original ROI time series to improve the classification performance. Furthermore, the maximum accuracy and AUC values are achieved at L = 2, indicating that the combination of the first-order derivative and the second-order derivative is capable of representing the relations among fMRI data points. This phenomenon can be interpreted as that the effect of weak-noise may be amplified in high-order weak derivatives (L > 2).

Fig. 2
figure 2

Classification accuracy (a) and AUC (b) with respect to L

Influence of Window Size, Step Size and Window Function on Classification Performance

One important step of the proposed method is using a sliding window to partition each ROI time series into multiple overlapping segments. Determining the optimal window size (S), step size (r) and function of the sliding window is essential to the proposed classification framework. In this section, we seek to investigate how the window size, step size, and function of the sliding window affect the classification performance of the proposed high-order and fused methods. Specifically, we varied S from 45 to 85 in steps of 5, and r from 1 to 2. With the window function maintained as the rectangular function, the classification accuracy of the high-order and fused methods with respect to S and r is shown in Fig. 3.

Fig. 3
figure 3

Classification accuracy of high-order and fused methods against different S and r

Compared to r = 2, the high-order and fused methods yield a relatively preferable performance when using r = 1. This phenomenon can be interpreted as that a larger step size leads to a less number of time series segments, while the number of time series segments is equal to the length of correlation time series in the high-order functional connectivity. A relatively short correlation time series may cause the insufficient information for representing the temporal variations among the correlations. It also can be understood as some temporal variations among correlations may be ignored while a relatively large step size is adopted.

By fixing r = 1, a relatively high and robust accuracy for the fused method can be achieved, when S varies between 55 and 70, and the highest accuracies for the high-order method are achieved within the range of 65 ≤ S ≤ 70. However, when S becomes too small or too large, the classification accuracy decreases gradually. This can be interpreted from two aspects. First, when S is too small, the temporal low-order networks are constructed based on small time series segments which can seriously reduce the reliability of correlation in the temporal low-order networks. Second, when S is too large, the number of time series segments is insufficient in computing the correlation time series, causing the high-order correlation, which is calculated based on the low-order correlation time series, unreliable.

By fixing r = 1 and S = 65, we explored the influence of sliding window function using six commonly used window functions, including Rectangular, Triangular, Hann, Hamming, Sine, and Gaussian. The mathematical representation of these windows is shown in Table 2. The classification accuracies of high-order and fused methods are shown in Fig. 4. The Triangular window achieved the lowest classification accuracy while the Rectangular, Hann, Hamming window achieved relatively high classification accuracy.

Table 2 Mathematical representation of the tested sliding window functions
Fig. 4
figure 4

Classification accuracy of high-order and fused methods against different sliding window functions

Impact of the Number of Clusters U

In order to reduce the scale of the high-order networks, we perform a Ward’s hierarchical grouping method (Chen et al. 2014) to group the correlation time series into different clusters. We perform a series of experiments to explore the influence of the different number of clusters U on the classification performance. Specifically, we varied U from 100 to 700 in steps of 100 and report the ACCs of the proposed high-order and fused methods in Fig. 5. It can be observed that the high-order and fused methods yield a relatively high and robust classification accuracy. The highest ACCs of the high-order and fused methods are achieved at U = 300. However, the classification accuracy decreases gradually when U becomes too small or too large. This is reasonable since U determines the number of clusters and further controls the scale of the high-order networks. When U is too small, the correlation time series with different temporal variations are grouped into the same cluster, thus reducing the purity of clusters and the reliability of the mean correlation time series of each cluster. It further leads to the unreliability of high-order networks, which is constructed based on the mean correlation time series of each cluster. Meanwhile, when U is too large, the correlation time series with the similar temporal variations are partitioned into different clusters. It may cause the scale of the high-order network to be too large and produce redundant high-order features, thus deteriorating the performance of the classifiers.

Fig. 5
figure 5

Classification accuracy of high-order and fused methods with different the number of clusters U

Effect of the Fusion Weight α

The fusion weight α determines the contribution of the low-order and high-order networks in the final fusion classification framework. A larger α value indicates the larger contribution of the high-order networks and smaller contribution of the low-order networks to the classification. When α = 0, the fusion classification framework is reduced to the low-order sub-model, and when α = 1, the fusion classification framework is reduced to the high-order sub-model. In this experiment, we seek to investigate how the classification performance of the fusion framework is influenced by the changes of α. Specifically, we changed the value of α from 0 to 1 with a step of 0.05, and reported the classification accuracy and AUC values of the fusion classification framework in Fig. 6. Relatively high accuracy values was achieved with the range of 0.45 ≤ α ≤ 0.55, indicating the relatively equal contribution from the low- and high-order connectivity. On the other hand, AUC of more than 0.95 can be achieved over a relatively wide range of 0.45 ≤ α ≤ 0.85, indicating relatively robust and consistent classification performance with respect to α. Furthermore, higher classification accuracy and AUC were achieved when the fusion strategy is used (0 < α < 1), indicating the dominance of fusing the dynamic correlation information and the holistic correlation information.

Fig. 6
figure 6

Classification accuracy (a) and AUC (b) of the proposed fused method with respect to α

Classification Accuracy

In this work, we compare the MCI classification performance of the proposed fused ULS Group constrained-UOLS method with other methods, which include the conventional low-order correlation-based method, the high-order correlation-based method (Chen et al. 2016), the fused correlation-based method (Chen et al. 2016), the low-order ULS Group constrained-UOLS method, and the high-order ULS Group constrained-UOLS method on the same dataset. As mentioned in “Feature Extraction” section, we extract four types of features from the low-order and high-order ULS Group constrained-UOLS networks, including the weighted clustering coefficient, betweenness centrality, in-degree, and out-degree. Therefore, 4 × M low-order features and 4 × U high-order features are extracted from low-order and high-order ULS Group constrained-UOLS networks, respectively (M = 90 and U = 300). Because low-order and high-order correlation-based networks are non-sparse symmetric matrices, only the weighted clustering coefficient and degree can be extracted from the correlation-based networks. Hence, we extract 2 × M low-order features and 2 × U high-order features from low-order and high-order correlation-based networks, respectively (M = 90 and U = 300). For each competing method, we use the nested 10-fold cross-validation scheme described in “Classification” section to optimize the feature selection parameters ε1, ε2 and ε3, and evaluate the classification performance. The number of selected features fed into the classifier of each method is provided in Table 3.

Table 3 The number of extracted features and selected features for each method

The performance measures are computed according to the average results of the 10-fold cross-validation that is repeated 20 times and shown in Table 4. The proposed fused ULS Group constrained-UOLS method yields the best accuracy of 85.5%, the best sensitivity of 86.6%, and the best AUC of 0.960, respectively. Although the best specificity of 88.9% is obtained by the high-order ULS Group constrained-UOLS method, it also indicates that the dynamic correlation information is important and should be incorporated into MCI classification. By comparing these results, we can demonstrate the superiority of fused ULS Group constrained-UOLS method from three aspects. First, the sparse regression-based models (ULS Group constrained-UOLS) outperform the correlation-based models, indicating that the sparse network structure and the information of weak derivatives are beneficial to MCI classification. Second, the high-order networks perform better than the low-order networks, confirming the importance of dynamic correlation information for MCI classification. Third, the fusion networks have better classification performance than the corresponding low-order and high-order networks, proving the necessity of simultaneously considering the dynamic correlation information and the holistic correlation information. Furthermore, our proposed method performs significantly better than all the competing methods in terms of ACC, SPC, and AUC based on the two-sample t-test results on 20 repetitions. Table 4 also shows the p-values for the comparison of AUC between the proposed method and other methods. Figure 7 shows the ROC curves of all compared methods. It can be observed that the proposed method achieves the largest area under ROC curves, indicating its excellent diagnostic ability with respect to the changes of the discrimination threshold.

Table 4 Performance comparison between correlation-based and ULS Group constrained-UOLS networks
Fig. 7
figure 7

ROC curves for all comparison methods

Low-Order Functional Connectivity Network

Figure 8a, b show two examples of the low-order functional connectivity networks for MCI and NC, respectively. The discrepancy network obtained by subtracting the MCI network from the NC network is shown in Fig. 9. The two functional connectivity networks show the relatively different patterns in terms of network topology and strength. For instance, the connectivity between bilateral precuneus is disrupted in MCI subjects, compared to the NC subjects who show stronger connectivity strength between them, which is consistent with the previous study (Haussmann et al. 2017; Mi et al. 2017) that the atrophy in precuneus is related to AD/MCI. Figure 10a, b graphically show the topology of the aforementioned networks for MCI and NC subjects, respectively. With the same threshold of 0.4, totally 51 connections are survived in the NC network while only 42 connections are survived in the MCI networks. This observation is consistent with the previous study (Zhou et al. 2013), indicating that some connections of the MCI network have been disrupted by pathological attacks.

Fig. 8
figure 8

Low-order functional connectivity networks for MCI (a) and NC (b)

Fig. 9
figure 9

The discrepancy between the low-order networks for MCI and NC

Fig. 10
figure 10

The structure of two aforementioned networks for MCI (a) and NC (b)

Moreover, to demonstrate the temporal variation of correlations, we compare a series of temporal low-order functional connectivity networks. Specifically, Fig. 11a shows the low-order functional connectivity network of one MCI subject, which characterizes the holistic correlation between different ROIs calculated on the whole time series. The low-order correlation between a pair of ROIs shows no temporal variation. In order to extract temporal variation information contained in the fMRI time series, the sliding window approach is employed to construct a collection of temporal low-order functional connectivity networks, with each of them computed based on a segment of fMRI subseries that is equivalent to the length of sliding window. Some of the temporal networks are provided in Fig. 11b-f. It is clearly observed that these temporal low-order networks, which are computed using subsequence fMRI subseries, consistently change over the scanning period. Therefore, we can conclude that the conventional low-order network, which is computed using the entire fMRI time series, is unable to characterize the temporal variations of correlations between different ROIs over the entire scanning period, and thus the other type of network should be constructed to effectively characterize the dynamic correlation between the ROIs. The existing methods, which use the dynamic networks (temporal low-order networks) for diagnosing brain diseases (Wee et al. 2016; Zhu et al. 2016), extract the network features from the temporal low-order networks directly to construct the brain disease classifiers. These methods do not take into account the temporal continuity of correlations between ROIs. In other words, these methods consider each temporal low-order network independently and omit the information of relative positions of the temporal low-order networks. The absence of this information may lead to a suboptimal classification model. Therefore, we further construct the high-order networks to overcome the limitation of the temporal low-order network methods.

Fig. 11
figure 11

Conventional and temporal low-order functional connectivity networks for MCI. (a) Conventional network; (bf) Temporal networks

The Clustering in the Construction of High-Order Functional Connectivity Network

As mentioned earlier, Ward’s hierarchical grouping (Chen et al. 2014) is applied to group the correlation time series into different clusters. In this clustering method, parameter initialization is not required, reducing the dependence of the method on the hyper-parameters and thus improving the robustness of clustering results. Some correlation time series are displayed in Fig. 12a. It can be clearly observed that the correlations between ROIs undergo large variation over the scanning period. Furthermore, the significant correlation between ROIs is only occurred for a short period of time and is almost zero in the rest of the time series. Figure 12b shows the clustering results for the correlation time series, with those belonging to the same cluster being depicted in the same color. By comparing Figs. 12a and b, we find that the correlation time series with similar temporal variations are grouped into the same cluster, while the time series with dissimilar temporal variation are assigned to different clusters. In Fig. 12c, we provide the mean correlation time series for each cluster. By using the mean correlation time series of each cluster as the new vertices of a high-order network, we construct a collection of small-scale high-order functional connectivity networks, without largely losing the important dynamic correlation information.

Fig. 12
figure 12

Clustering results of the correlation time series. a Original correlation time series; b Three clusters of the correlation time series based on Ward’s algorithm (correlation time series from the same cluster are depicted with the same color); c The mean correlation time series of each cluster

The Most Discriminative Regions and Clusters

The most discriminative brain regions and clusters are defined as the ones with the highest frequency of selection in 20 repetitions of 10-fold cross-validation, corresponding to the low-order and the high-order network, respectively. As mentioned above, we use clusters to replace the original correlation time series as the new vertices of high-order networks for reducing the scale of networks. Hence, for high-order networks, we select the most discriminative clusters as the biomarker for MCI identification.

The Most Discriminative Brain Regions and Low-Order Connections

The most discriminative brain regions selected from the low-order networks are listed in Table 5 and displayed in Fig. 13, which include the right olfactory cortex (OLF.R), right inferior frontal gyrus triangular part (IFGtriang.R), left hippocampus (HIP.L), left median cingulate and paracingulate gyri (DCG.L), and left superior frontal gyrus orbital part (ORBsup.L). These regions are frequently reported as highly associated with AD/MCI pathology (Chen et al. 2016; Matsuda 2013; Salvatore et al. 2015; Vasavada et al. 2015; Xu et al. 2016). Particularly, olfactory deficits are prevalent in AD/MCI patients (Vasavada et al. 2015). The symptoms of olfactory deficits normally appear before the clinical cognitive deficits and memory deficits, which is consistent with the fact that AD attacks the central olfactory structures preferentially (Vasavada et al. 2015). Hence, the decline in olfactory cortex (OLF), can be regarded as an earlier and effective biomarker for identifying MCI from healthy elderly. Previous works also reported that hippocampus (HIP) is a key brain region for memory (Jaroudi et al. 2017; Du et al. 2001; Sze et al. 1997). The atrophy in HIP is believed to be an early biomarker of AD/MCI. On the other hand, the most discriminative brain regions are located mainly in the frontal lobe and limbic lobe. In the previous studies, the frontal lobe, which plays an important role in the integration of non-task based memories stored, has been reported as the important brain area associated with AD/MCI pathology (Ribeiro and Busatto Filho 2016). The limbic lobe is believed to be highly related with language and memory, in which AD patients showed an obvious reduction in regional cerebral blood flow (Takahashi et al. 2017).

Table 5 The top 20 most discriminative ROIs selected from the low-order network
Fig. 13
figure 13

The top 20 most discriminative ROIs

A multiple linear regression between CDR score and feature values for the selected top 20 most discriminative ROIs is performed to examine the relationship between the CDR score and the selected ROIs. Then, an Analysis of Variance (ANOVA) is adopted to check the significance of the correlation between the CDR score and the selected ROIs. The ANOVA F-test value is 2.003 (p-value is 0.03), which is larger than the critical value of F-test under 95% confidence interval, indicating that the CDR scores show a significant linear correlation with feature values of the selected ROIs. Thus, the most discriminative ROIs selected in this work are highly correlated to the CDR score, in line with the diagnosis criteria of MCI subjects used in this study.

In addition, we calculate the low-order connections among the top 20 most discriminative ROIs by averaging the low-order connectivity strengths of all MCI and NC subjects, respectively. The low-order connections are provided in Fig. 14. The discriminative power of each connection between MCIs and NCs are evaluated by using the two-sample t-test, and p-values of all connections are provided in Fig. 15a. Figure 15b provides the connections that are significantly different (p < 0.05) between NC and MCI subjects. According to statistical analysis results, we observe a significant difference in functional connection between left and right hippocampi (HIP.L and HIP.R) for MCI and NC subjects. The connectivity strength between HIP.L and HIP.R was significantly larger in MCI subjects compared to NC subjects. This phenomenon can be interpreted as that the MCI subjects may require stronger interactions between some brain regions than NC subjects for brain compensation due to cognitive impairment (Salvatore et al. 2015), which is consistent with the brain compensatory phenomenon that has been reported in the literature (Jie et al. 2016; Qi et al. 2010).

Fig. 14
figure 14

Connectivity strengths among the top 20 most discriminative ROIs identified from the low-order network

Fig. 15
figure 15

p-values of connections among the top 20 most discriminative ROIs (a), and p-values less than 0.05 (b)

The Most Discriminative Clusters and High-Order Connections

As for the high-order networks, the top 20 most discriminative clusters and the ROI-pairs contained in these clusters are listed in Table 6. Most of these clusters contain only one ROI-pair. The ROI-pairs in the most discriminative clusters include the left middle occipital gyri−right middle occipital gyri (MOG.L−MOG.R), left Heschl gyrus−left superior temporal gyrus (HES.L−STG.L), left middle occipital gyri−left inferior occipital gyrus (MOG.L−IOG.L), suggesting that the time-varying characteristics of the correlation time series corresponding to these ROI-pairs play an important role in MCI classification.

Table 6 The top 20 most discriminative clusters selected from the high-order network

Additionally, Fig. 16 shows the difference of high-order connections among the top 20 most discriminative clusters between MCI and NC subjects. Different from the low-order connection represents the functional connectivity strength of between two ROIs, the high-order connection, which is calculated based on the low-order correlation time series, describes the relation between two clusters (i.e. the relation between the low-order connections). The significant difference in high-order connection is observed between the cluster 13 (containing ROI-pair MOG.L−IOG.L) and cluster 9 (containing ROI-pair MOG.L−MOG.R). This high-order connection (row 13, column 9) in NC subjects is stronger than that in MCI subjects, which implies that the temporal variations of the low-order connectivity strength between MOG.L and IOG.L is highly correlated with the temporal variations of the low-order connectivity strength between MOG.L and MOG.R in NC subjects, but not in MCI subjects. This difference between MCI and NC can be used as an effective biomarker for identifying MCI from NC.

Fig. 16
figure 16

Comparison of correlation difference among the top 20 most discriminative clusters identified based on the high-order connectivity networks, where elements with zero indicate no correlation between two clusters

Conclusion

In this paper, we proposed a novel fusion approach to infer the high- and low-order functional connectivity networks for MCI classification. By using the ULS Group constrained topology structure detection algorithm and UOLS algorithm, our approach successfully extracts the temporal information of connections among fMRI data, which can be further used to construct the functional connection for the low- and high-order networks. Furthermore, hierarchical grouping (based on Ward’s algorithm) is applied to reduce the scale and computational complexity of the high-order network. By fusing the DCT sub-models that are trained using the low- and high-order networks separately, our proposed approach integrates both the information of holistic correlations and the temporal variations at the disease identification stage. Promising results obtained demonstrate the superiority of our proposed method and also the importance of integrating both the conventional holistic correlation and the dynamic temporal correlation information for improving the performance of MCI classification.

Information Sharing Statement

The dataset used in this paper are from the Alzheimer’s Disease Neuroimaging Initiative (ADNI, RRID:SCR_003007) which are available at http://adni.loni.usc.edu/. The source code developed in this paper are available via GitHub website (https://github.com/792866506/High-and-Low-order-Networks).