Introduction

Many pattern classification methods have been proposed for the diagnosis of Alzheimer’s disease (AD) or its prodromal stage, mild cognitive impairment (MCI). However, most existing pattern classification methods aim only for the binary classification, i.e., identifying whether a subject is diseased or healthy from the imaging data. Recently, many researchers have started investigating the application of pattern regression methods to estimating the continuous clinical scores from brain images (Fan et al. 2010; Stonnington et al. 2010; Wang et al. 2010). Compared to pattern classification, pattern regression methods can help assess pathological stage and predict future progression of neurological diseases. It is well known that many diseases result in a continuous spectrum of structural and functional changes. For example, AD pathology is known to progress gradually over many years, sometimes starting decades before a final clinical stage (Wang et al. 2010). Many studies on the combination of neuropsychological and neuroimaging data show that the preclinical AD is associated with both cognitive and imaging changes (Caselli et al. 2009, 2007; Reiman et al. 1996, 2009; Twamley et al. 2006). Therefore, pattern regression methods can potentially be used to estimate the continuous clinical scores from neuropsychological and neuroimaging data for helping assess the stage of AD disease or predict clinical outcome.

Most existing methods for estimating cognitive test scores use only a single modality of data, e.g., magnetic resonance imaging (MRI) (Fan et al. 2010; Stonnington et al. 2010; Wang et al. 2010). However, biomarkers from different modalities can provide complementary information for AD diagnosis, which is demonstrated in the recent works on multimodal AD diagnosis (Bouwman et al. 2007b; Chetelat et al. 2005; Fan et al. 2008b; Fellgiebel et al. 2007; Geroldi et al. 2006; Hinrichs et al. 2009a, b; Vemuri et al. 2009; Visser et al. 2002; Walhovd et al. 2010a, b; Zhang et al. 2011). For example, MRI can measure spatial patterns of atrophy, and thus can be used to define surrogate markers of the underlying neurodegenerative AD pathology (Fan et al. 2008a). Fluorodeoxyglucose positron emission tomography (FDG-PET) may detect early neocortical dysfunction before atrophy appears (De Santi et al. 2001). Specifically, some recent studies have reported the reduction of glucose metabolism in parietal, posterior cingulate, and temporal brain regions with FDG-PET in AD patients (Diehl et al. 2004; Drzezga et al. 2003). Also, before the appearance of atrophy and the reduction of glucose metabolism, the biological cerebrospinal fluid (CSF) markers (such as concentrations of amyloid β (Aβ42)) can appear at brain regions of AD patients (Bouwman et al. 2007b; de Leon et al. 2007; Fjell et al. 2010). Therefore, these multimodal data can be potentially used jointly for estimation of clinical cognitive test scores.

On the other hand, supervised relevance vector machine regression (RVR) method (Tipping 2001) has been adopted to estimate the continuous clinical scores from brain images (Fan et al. 2010; Stonnington et al. 2010; Wang et al. 2010). Moreover, Duchesne et al. (2009) demonstrated the relationship between the baseline MRI features and the decline in the Mini-Mental State Exam (MMSE) after 1 year using linear regression modeling for subjects with MCI (Duchesne et al. 2009). In addition, the Alzheimer’s Disease Assessment Scale-Cognitive subtest (ADAS-Cog) is the gold standard in AD drug trials for cognitive assessment (Rosen et al. 1984). In all aforementioned studies, the supervised regression models are employed, which often require many labeled data for training in order to achieve good performance. However, the amount of labeled brain image data is usually limited in clinical practice. Also, due to the heterogeneity of MCI, the clinical scores obtained in cognitive tests, including MMSE and ADAS-Cog (Inoue et al. 2011), are less stable than those for AD and NC subjects. To partially alleviate this issue, instead of using the clinical scores of MCI subjects in this study, we propose to use only their multimodal data to help train a semi-supervised regression model.

To the best of our knowledge, no previous studies have investigated the use of MCI subjects as unlabeled data for constructing a semi-supervised regression model to estimate the cognitive performance from multimodal imaging and non-imaging data. We do however, notice that some investigators have used MCI subjects (also as unlabeled data) for constructing semi-supervised classification models and then using them for classifying AD patients from normal controls (NC) (Filipovych et al. 2011; Zhang and Shen 2011). However, these semi-supervised classification models can only deal with discrete category (usually binary, i.e., diseased or normal), but cannot estimate continuous clinical scores for disease progression as discussed in this paper. We propose a semi-supervised multimodal relevance vector regression (SM-RVR) method to estimate clinical scores from both imaging and biological biomarkers (i.e., MRI, FDG-PET, and CSF) for new subjects. Moreover, we develop a new strategy to select the most informative MCI subjects for training our semi-supervised regression model (SM-RVR).

In our proposed SM-RVR method, we first employ a k-nearest neighbor (k-NN) regression model to estimate the clinical scores for the MCI training subjects. Then, we use a supervised multimodal RVR (M-RVR) to select the most informative MCI subjects by iteratively testing the capability of each MCI training subject in enhancing the estimation of clinical scores for the new testing subjects. Finally, we train an M-RVR model using all AD and NC training subjects and also those selected informative MCI subjects. We have evaluated the performance of our method on 202 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database and achieved very promising results as compared to the state-of-the-art methods.

Method

The data used in the preparation of this article were obtained from the ADNI database (www.loni.ucla.edu/ADNI). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians in developing new treatments and monitor their effectiveness, as well as to lessen the time and cost of clinical trials.

ADNI is the result of efforts by many co-investigators from a broad range of academic institutions and private corporations. Subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 adults, aged 55 to 90, to participate in the research-approximately 200 cognitively normal older individuals to be followed for 3 years, 400 people with MCI to be followed for 3 years, and 200 people with early AD to be followed for 2 years (see www.adni-info.org for up-to-date information). The research protocol was approved by each local institutional review board and written informed consent was obtained from each participant.

Subjects

The ADNI general eligibility criteria are described at www.adni-info.org. Briefly, subjects are between 55 and 90 years of age and have a study partner able to provide an independent evaluation of functioning. Specific psychoactive medications will be excluded. General inclusion/exclusion criteria are as follows: 1) healthy subjects: Mini-Mental State Examination (MMSE) scores between 24 and 30, a Clinical Dementia Rating (CDR) of 0, non-depressed, non MCI, and non-demented; 2) MCI subjects: MMSE scores between 24 and 30, a memory complaint, having objective memory loss measured by education adjusted scores on Wechsler Memory Scale Logical Memory II, a CDR of 0.5, absence of significant levels of impairment in other cognitive domains, essentially preserved activities of daily living, and an absence of dementia; and 3) Mild AD: MMSE scores between 20 and 26, CDR of 0.5 or 1.0, and meets the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS/ADRDA) criteria for probable AD.

In this paper, all ADNI subjects with all corresponding baseline MRI, FDG-PET and CSF data were selected, leading to a total of 202 subjects, including 51 AD patients, 99 MCI patients (including 43 MCI converters (MCI-C) and 56 MCI non-converters (MCI-NC)), and 52 normal controls. Table 1 lists the demographics of all these subjects.

Table 1 Clinical and demographic information for 202 subjects (mean ± std)

MRI, PET and CSF

All MRI data were acquired from 1.5 T scanners. Data were collected by a variety of scanners with protocols individualized for each scanner. Raw Digital Imaging and Communications in Medicine (DICOM) MRI scans were reviewed for quality. Spatial distortions caused by gradient nonlinearity and B1 field inhomogeneity were automatically corrected. PET images were acquired 30–60 min post-injection, averaged, spatially aligned, interpolated to a standard voxel size, intensity normalized, and smoothed to a common resolution of 8-mm full width at half maximum. CSF was collected in the morning after an overnight fast using a 20- or 24-gauge spinal needle, frozen within 1 h of collection, and transported on dry ice to the ADNI Biomarker Core laboratory at the University of Pennsylvania Medical Center. In this paper, CSF Aβ42, CSF t-tau, and CSF p-tau are used as features.

Image Analysis

To obtain effective image features for score estimation, we use the specific application tool for image pre-processing similar to (Zhang et al. 2011). First, for all images, we corrected the intensity inhomogeneity using the N3 algorithm (Sled et al. 1998). Next, for all structural MR images, we used both brain surface extractor (BSE) (Shattuck et al. 2001) and brain extraction tool (BET) (Smith 2002) to perform skull-stripping. Then, we further removed cerebellum material. After skull-stripping and cerebellum removal, for each structural MR image, we used the FSL package (Zhang et al. 2001) to segment it into three different tissues: gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). After tissue segmentation, we use atlas warping to partition each subject into 93 ROIs via a template shown in Fig. 1. Then, for each of 93 ROIs, we computed the volume of GM tissue in that ROI as a feature. For the PET image, we used a rigid transformation to align it onto its respective MR image of the same subject, and then computed the average intensity of each ROI in the PET image as a feature. By performing this series of image pre-processing steps, for each subject we acquire 93 features from the MRI image, and another 93 features from the PET image. In addition, we used 3 features from the CSF biomarkers, as mentioned above.

Fig. 1
figure 1

A template with 93 manually-labeled ROI regions used for automated labeling of MRI and PET images

Semi-Supervised Multimodal RVR (SM-RVR)

In this section, we first extended the standard relevance vector regression (RVR) method to the multimodal RVR (M-RVR), and then employed the k-NN regression and M-RVR for selecting the most informative MCI subjects. Finally, we trained an M-RVR model using all AD and NC training subjects, and the selected more-informative MCI subjects to estimate the clinical scores of new subjects.

Multimodal Relevance Vector Regression (M-RVR)

Relevance Vector Regression (RVR)

We will first briefly review the standard RVR algorithm. The main idea of RVR is summarized as follows. Specifically, RVR is a sparse kernel method formulated in a Bayesian framework (Tipping 2001). Given a training set with its corresponding target values, such as \( \left\{ {{{\mathbf{x}}_i},{t_i}} \right\}_{i=1}^N \), RVR aims to find out the relationship between the input feature vector x and its corresponding target value t:

$$ t=f\left( {\mathbf{x},\mathbf{w}} \right)+{\varepsilon_i} $$
(1)

Where ε i is the measurement noise (assumed independent and following a zero-mean Gaussian distribution), and f(x, w) is a linear combination of basis kernel functions k(x, x i ) with the following form:

$$ f\left( {\mathbf{x},\mathbf{w}} \right)=\mathop{\sum}\limits_{i=1}^N{w_i}k\left( {\mathbf{x},{{\mathbf{x}}_i}} \right)+{w_0} $$
(2)

Where \( \mathbf{w}={{\left( {{w_0},{w_1},\ldots,{w_N}} \right)}^T} \) is a weight vector. According to (Tipping 2001), we can obtain a sparse kernel regression model based on the weight vector w.

Multimodal RVR (M-RVR)

Now we can extend RVR to multimodal RVR (M-RVR) for multimodal regression by defining a new integrated kernel function for comparison of two multimodal data x and x i as below:

$$ k\left( {\mathbf{x},{{\mathbf{x}}_i}} \right)=\mathop{\sum}\limits_m{c_m}{k^{(m) }}\left( {{{\mathbf{x}}^{(m) }},\mathbf{x}_i^{(m) }} \right) $$
(3)

Where x (m) and x i (m) denote the m-th modality of multimodal data x and x i , respectively, k (m) denotes the kernel function over the m-th modality, and c m denotes the weight on the m-th modality. This new integrated multiple-kernel can be embedded into the conventional single-kernel RVR, and thus solved by the programs developed for the conventional single-kernel RVR (Tipping 2001). To find the optimal values for the weights c m , in this paper, we constrained them so that \( \mathop{\sum}\limits_m{c_m}=1 \) and then adopt a coarse-grid search through cross-validation on the training samples, which has been shown effective in our previous work for multi-kernel support vector machine (SVM) (Zhang and Shen 2011; Zhang et al. 2012, 2011). Compared with most existing multi-kernel learning methods (Rakotomamonjy et al. 2008; Wang et al. 2008; Xu et al. 2010), the main advantage of our method is that it can be conveniently embedded into the conventional RVR and thus solved using the conventional RVR solvers, e.g., Sparse Bayesian toolbox (Tipping 2001). In this paper, we used M-RVR to fuse data from three different modalities, i.e., MRI, PET and CSF. Figure 2 shows the flowchart of M-RVR for multimodal regression using MRI, PET and CSF data.

Fig. 2
figure 2

The flowchart of M-RVR

Semi-Supervised M-RVR (SM-RVR)

As discussed before, due to the heterogeneity of MCI, the clinical scores of MCI subjects are usually less stable than those of the AD and NC subjects. Therefore, the direct use of MCI subjects (i.e., including their corresponding clinical scores as targets) together with the AD and NC subjects for training the regression model could lead to low performance. To address this issue, we propose to first employ the k-NN regression method for estimating clinical scores of MCI subjects by using clinical scores of the AD and NC subjects, which are more stable than those of MCI subjects. Then, we suggest using M-RVR to select the most informative MCI subjects for subsequent data regression by iteratively testing the capability of each MCI training subject in enhancing the estimation of clinical scores for the new testing subjects. These two steps are described below one by one.

Estimating Clinical Scores of MCI Subjects

Figure 3 plots the distributions of AD, NC, and MCI subjects with CSF features. As we can see from Fig. 3, there exists a large overlap between MCI subjects and AD/NC subjects. This overlap implies that in the CSF feature space the MCI subjects are similar to some AD or NC subjects. Similar phenomena can also be observed in MRI and PET feature spaces, which to some extent explains why AD and NC subjects can be used for estimating the clinical scores of MCI subjects. The flowchart of estimating clinical scores of MCI subjects is shown in Fig. 4. Specifically, for each modality of MRI, PET, and CSF, we used a respective k-NN (i.e., find the k-nearest neighbors among AD and NC training subjects) to estimate clinical scores of each MCI subject, and then computed the average of estimated scores from all three modalities as the final clinical score for that MCI subject.

Fig. 3
figure 3

Distributions of AD, NC and MCI subjects with CSF features. X is CSF Aβ42, Y is CSF t-tau, and Z is CSF p-tau

Fig. 4
figure 4

The flowchart of estimating clinical scores for MCI subjects

M-RVR Based Recursive Sample Selection, to Select the Most Informative MCI Subjects

After getting the estimated clinical score for each MCI subject, one can simply merge those ‘relabeled’ MCI subjects with AD and NC training subjects to train a regression model. However, to further improve the performance, we propose to first select the most informative MCI subjects and then use them together with the AD and NC training subjects to train the regression model. The flowchart of selecting the most informative MCI subjects is shown in Fig. 5. Specifically, we denote L as the labeled set which initially contains only AD and NC training subjects, and denote \( \mathbf{U}=[\ {{\mathbf{u}}_1},\ldots,\ {{\mathbf{u}}_i},\ldots,\ {{\mathbf{u}}_r}] \) as the unlabeled set which initially contains all MCI subjects with their estimated clinical scores (from AD and NC training subjects using k-NN) as targeted values. Then, we individually combine each sample in U with all samples in L to train an M-RVR regression model (using L as corresponding testing set) and record the corresponding root-mean-square error (RMSE). Finally, the sample \( {{\mathbf{u}}_{{{i_0}}}} \) with the minimum RMSE value (i.e., most informative) in U is selected and added into L, and further deleted fromU. The above procedure was repeated for T iterations to select the most informative subset of MCI subjects.

Fig. 5
figure 5

The flowchart of M-RVR based recursive sample selection

Finally, we trained the M-RVR model using the AD and NC training subjects and also those above-selected most informative MCI subjects (with their estimated clinical scores as targets) to estimate the clinical scores for the new test subjects.

Validation

To evaluate the performance of our proposed method, we adopted the popular root-mean-square error (RMSE) and the correlation coefficient (CORR) as performance measures. In our experiments, we used a 10-fold cross-validation strategy to divide the labeled sample set (i.e., AD and NC subjects) into a training set and a testing set. Specifically, the whole labeled sample set was equally partitioned into 10 subsets. Each time, the subject samples within one subset were successively selected as the testing samples and all remaining subject samples in the other 9 subsets were used for training. This process was repeated for 10 times independently to avoid any bias introduced by randomly partitioning the dataset in the cross-validation. The RVM regression learner was implemented using Sparse Bayesian toolbox,Footnote 1 with the Gaussian kernel. The iteration number T (1 ≤ T ≤ 99), the Gaussian kernel widthσ ∊ {2, 4, 8, 16, 32, 64, 128, 256, 512}, and the number of the nearest neighbors k (1 ≤ k ≤ 20) were learned based on the training samples by another cross validation. Similarly, the weights in the M-RVR were also learned based on the training samples, through a grid search using the range from 0 to 1 at a step size of 0.1. Also, for each modality feature in the labeled samples and the unlabeled samples, the same feature normalization scheme as used in (Zhang et al. 2011) was adopted here.

Results

Comparison on Different Combinations of MRI, PET and CSF Modalities

We first evaluated the performance of SM-RVR on different combinations of MRI, PET and CSF modalities (with totally 7 different combinations). Table 2 shows the performance measures of our SM-RVR method for different combinations. Figures 6 and 7 give the corresponding scatter plots of the actual clinical scores vs. the estimated clinical scores for the estimation of MMSE and ADAS-Cog scores, respectively. As we can see from Table 2 and Figs. 6 and 7, the combination of MRI, PET, and CSF can consistently achieve better results than both single-modality and two-modalities cases. Specifically, SM-RVR using all three modalities can achieve a RMSE of 1.92 and a CORR of 0.80 for MMSE scores, and a RMSE of 4.45 and a CORR of 0.78 for ADAS-Cog scores as shown in Table 2. Table 2 also indicates that the use of two modalities can also improve the regression performance compared with the single-modality cases, although they are inferior to the use of all three modalities. These results validate the advantage of multimodal regression over the conventional single-modal regression in estimation of clinical scores.

Table 2 Regression performance of SM-RVR with respect to different combination of MRI, PET and CSF modalities
Fig. 6
figure 6

Scatter plots of actual MMSE scores vs. the estimated MMSE scores for seven different combinations of modalities. a MRI, b CSF, c PET, d MRI + CSF, e MRI + PET, f CSF + PET, and g MRI + CSF + PET

Fig. 7
figure 7

Scatter plots of actual ADAS-Cog scores vs. the estimated ADAS-Cog scores for seven different combinations of modalities. a MRI, b CSF, c PET, d MRI + CSF, e MRI + PET, f CSF + PET, and g MRI + CSF + PET

Furthermore, to investigate the effect of different combining weights of MRI, PET and CSF (C MRI , C PET and C CSF ) in the estimation of MMSE and ADAS-Cog scores by our SM-RVR method, we tested all of their possible values, ranging from 0 to 1 at a step size of 0.1, under the constraint of \( {C_{MRI }}+{C_{PET }}+{C_{CSF }}=1 \). Figures 8 and 9 give MMSE and ADAS-Cog estimation results with respect to different combining weights of MRI, PET and CSF for our SM-RVR method, respectively. Figures 8(a) and 9(a) show the RMSE with respect to different combining weights of MRI, PET and CSF for MMSE and ADAS-Cog estimations, respectively, while Figs. 8(b) and 9(b) show the correlation coefficient (CORR) with respect to different combining weights of MRI, PET and CSF for MMSE and ADAS-Cog estimations, respectively. Note that in each plot of Figs. 8 and 9, only the squares in the upper triangular part have valid values because of the constraint \( {C_{CSF }}+{C_{MRI }}+{C_{PET }}=1 \). Also, in each plot, the three vertices of the upper triangle, i.e., the top left, top right, and bottom left squares, denote the individual-modality based regression results, respectively. Moreover, for each plot, the three edges of the upper triangle (excluding the three vertices of the upper triangle) denote two-modalities based regression results using MRI + CSF (C PET  = 0), MRI + PET (C CSF  = 0), and CSF + PET (C MRI  = 0). From Figs. 8(a) and 9(a), most of RMSE values within squares of the upper triangle are smaller than those on the three vertices and edges. Also, Figs. 8(b) and 9(b) show that most of CORR values within squares of the upper triangle are larger than those on the three vertices and edges. These results further validate that combining three modalities can achieve better regression results than combining only two modalities or using only one modality.

Fig. 8
figure 8

MMSE estimation results with respect to different combining weights of MRI, PET and CSF for SM-RVR. CSF weight denotes asC CSF , and MRI weight denotes as C MRI . Note that PET weight C PET is not shown, since it can be determined as \( {C_{PET\ }}=1 - {C_{CSF }}-{C_{MRI\ }} \). a RMSE for MMSE Estimation, b CORR for MMSE Estimation

Fig. 9
figure 9

ADAS-Cog estimation results with respect to different combining weights of MRI, PET and CSF for SM-RVR. CSF weight denotes asC CSF , and MRI weight denotes asC MRI . Note that PET weight C PET is not shown, since it can be determined as \( {C_{PET\ }}=1 - {C_{CSF }}-{C_{MRI\ }} \). a RMSE for ADAS-Cog Estimation, b CORR for ADAS-Cog Estimation

Comparsion with Supervised M-RVR and Other Variants

To investigate the efficacy of using the unlabeled MCI samples for helping regression, we first compare SM-RVR with the supervised M-RVR (M-RVR for short). In our experiment, we implemented two versions of M-RVR, i.e., 1) M-RVR (AD + NC) which trains a supervised model using only the AD and NC subjects, and 2) M-RVR (AD + NC + MCI) which trains a supervised model using not only the AD and NC subjects, but also the extra MCI subjects (with their corresponding clinical scores). Here, for fair comparison, all methods (including both versions of M-RVR and SM-RVR) are validated on the same testing data involving only the AD and NC subjects, since the corresponding clinical scores of the MCI subjects are less stable due to the heterogeneity of MCI. Notice that although both M-RVR and SM-RVR involve MCI subjects in training models, the former uses both data and the corresponding clinical scores of MCI subjects (together with AD and NC subjects) for training a supervised model, while the latter uses only the data of MCI subjects as unlabeled data (together with the labeled AD and NC subjects) for training a semi-supervised model. The corresponding results are shown in Table 3. As shown in Table 3, SM-RVR consistently outperforms the two different versions of M-RVR on each performance measure. These findings validate the efficacy of our SM-RVR method in using MCI subjects as unlabeled samples in a semi-supervised framework for improving regression predictive utility. Furthermore, in Fig. 10, we plotted the curves of regression performance measures (i.e., RMSE and CORR) with respect to the use of different number of unlabeled MCI samples for SM-RVR. It shows that the regression performance of SM-RVR steadily improves at first as the number of unlabeled MCI samples increases, (and is significantly better than that of M-RVR in most cases), but it decreases after reaching a certain value. This indicates the importance of using only the selected informative MCI subjects, not all MCI subjects, as unlabeled samples for estimation of clinical scores.

Table 3 Comparison of regression performance of SM-RVR and M-RVR
Fig. 10
figure 10

Comparison of SM-RVR with UNSM-RVR and M-RVR, with respect to the use of different number of unlabeled samples (MCI subjects). a RMSE in estimating MMSE, b CORR in estimating MMSE, c RMSE in estimating ADAS-Cog, and d CORR in estimating ADAS-Cog

To further validate the efficacy of selecting the most informative MCI subjects for clinical score estimation, we also implemented a variant of SM-RVR, i.e., SM-RVR without selecting the most informative MCI subjects (denoted here as UNSM-RVR). Specifically, in the UNSM-RVR method, MCI subjects (with their corresponding estimated clinical scores) are directly combined with AD and NC training subjects to train the M-RVR regression model. In Fig. 10, we also plotted the curves of regression performance measures (i.e., RMSE and CORR) with respect to the use of different number of unlabeled MCI samples when using SM-RVR and UNSM-RVR methods. Here, for UNSM-RVR, we randomly choose a part of MCI samples and directly merged them with the current training set (i.e., training AD and NC subjects) to train the regression model. From Fig. 10, we can observe that both curves show a rising trend in performance with the increase of MCI subjects in most cases, but SM-RVR is significantly and consistently better than UNSM-RVR in estimating both MMSE and ADAS-Cog scores. This implies that the use of selected MCI subjects as unlabeled samples is superior to using the selected MCI subjects as labeled samples, and also better than the use of all available MCI subjects in clinical score estimation.

Discussion

In this paper, we have proposed a new semi-supervised multimodal regression method, called SM-RVR, to estimate standard clinical cognitive test scores (including MMSE and ADAS-Cog) from multimodal brain imaging and non-imaging data. The experimental results on 202 subjects (with all acquired MRI, FDG-PET and CSF data) from ADNI database show that our method can consistently and substantially improve the regression performance, compared with the individual modality-based regression methods or the conventional supervised multimodal regression methods. Our experiment’s results validate our assumption that the clinical scores of MCI subjects are less stable than those of AD and NC subjects due to the heterogeneity of MCI, and thus provide little help in further improving the performance of a supervised model trained using AD and NC subjects. On the other hand, the multimodal data of MCI subjects may contain more useful information than their corresponding clinical scores, and can lead to great performance improvement if used for training a semi-supervised model by treating the MCI subjects as unlabeled data and the AD and NC subjects as labeled data. As for clinical applications, our method can be used to help assess the pathological stage and further predict the future progression of AD or other neurological diseases.

Semi-Supervised Learning

Semi-supervised learning is a recently developed machine learning technique, which learns from both labeled and unlabeled samples (Adams 2009; Belkin et al. 2004; Belkin and Niyogi 2004; Belkin et al. 2006; Ding and Zhao 2010; Ni et al. 2012). In the semi-supervised learning, one key issue is how to estimate labels for the new samples by using the relatedness between labeled and unlabeled samples. The existing semi-supervised learning methods deal with this issue in the following ways: (1) using a manifold hypothesis and the graph Laplacian for delivering the relatedness between labeled and unlabeled samples (Belkin et al. 2004; Belkin and Niyogi 2004; Belkin et al. 2006); (2) learning a propagable graph for semi-supervised classification and regression (Ni et al. 2012); (3) transductive learning and varifold Laplacian framework based on the hypothesis of weaker varifold structure (Bruzzone et al. 2006; Ding and Zhao 2010), etc.

Recently, some researchers have used semi-supervised classification methods to diagnose AD or MCI with brain images (Filipovych et al. 2011; Zhang and Shen 2011). In (Filipovych et al. 2011), MCI subjects are used as unlabeled samples, and the transductive SVM is adopted for MCI conversion classification. In (Zhang and Shen 2011), MCI subjects are also used as unlabeled samples while the multimodal LapRLS method is adopted for AD vs. NC classification. Both studies suggest that the semi-supervised methods can achieve better performance than the corresponding supervised methods by using unlabeled data (MCI subjects in both cases). However, pattern classification methods can only assign a discrete category to each subject, and cannot deal with continuous scores in disease progression. Accordingly, in this paper, we proposed to use the semi-supervised regression model for estimating continuous clinical cognitive test scores from multimodal data. Experimental results demonstrate that our method achieved better regression performance than the conventional supervised regression methods.

Multimodal Regression

Many recent studies have indicated that different modalities contain complementary information for discrimination of AD, and thus many works on combining different modalities of biomarkers have been reported for multimodal classification(Bouwman et al. 2007a; Chetelat et al. 2005; Fan et al. 2008b; Fellgiebel et al. 2007; Geroldi et al. 2006; Vemuri et al. 2009; Visser et al. 2002; Walhovd et al. 2010a). In those methods, features from all different modalities are typically concatenated into a longer feature vector for the purpose of multimodal classification. More recently, the multiple-kernel method has also been used for multimodal data fusion and classification, and achieves better performance than the feature concatenation method (Hinrichs et al. 2011; Hinrichs et al. 2009b; Zhang et al. 2011).

In this paper, we also compare M-RVR and SM-RVR with the corresponding feature concatenation method, with results shown in Table 4. As we can see from Table 4, both multi-kernel based methods (including SM-RVR and M-RVR) achieve better performance than the corresponding feature concatenation methods. For example, SM-RVR achieves a RMSE of 1.92 and a CORR of 0.80 for estimating MMSE scores, and a RMSE of 4.45 and a CORR of 0.78 for estimating ADAS-Cog scores, respectively, while the corresponding feature concatenation methods (namely SMConcat) achieve only RMSE of 2.07 and 4.72, and CORR of 0.75 and 0.75 for estimating MMSE and ADAS-Cog scores, respectively.

Table 4 Comparison of regression performance of M-RVR vs. MConcat, and SM-RVR vs. SMConcat

We also perform experiments to compare RVR with another popular regression method, i.e., support vector regression (SVR), in multimodal regression. To do this, we replace RVR with SVR in both M-RVR and SM-RVR, and denote the corresponding methods as M-SVR and SM-SVR, respectively. Table 5 gives the comparisons between these different methods. It indicates that both RVR based methods (including SM-RVR and M-RVR) achieve better performance than the corresponding SVR based methods (including SM-SVR and M-SVR) on high-dimensional pattern regression.

Table 5 Comparison of regression performance of M-RVR vs. M-SVR, and SM-RVR vs. SM-SVR

Estimating Clinical Scores

Several recent works in AD research have studied estimating the clinical scores from brain images (Fan et al. 2010; Stonnington et al. 2010; Wang et al. 2010). It is worth noting that most existing works on estimating clinical scores are based on the supervised regression models using a single-modal data. In our previous work (Zhang et al. 2012), we developed a multimodal multi-task learning model for jointly estimating multiple clinical scores from multimodal imaging data, showing improved performance over single-modal based methods. However, the model proposed in (Zhang et al. 2012) is still a supervised regression model and thus cannot effectively integrate MCI subjects (with less stable clinical scores) for helping build the regression model.

For a clear comparison, in Table 6 we listed several state-of-the-art methods on estimating clinical scores, along with their respective performances. Specifically, in (Wang et al. 2010), a Bagging RVR was adopted to estimate the MMSE score from the baseline MRI data, and the best result of RMSE of 3.29 and the correlation coefficient of 0.76 were achieved on 23 AD, 74 MCI and 22 NC. In (Fan et al. 2010), a recursive feature elimination strategy for feature selection and RVR were adopted to estimate MMSE and ADAS-Cog scores from structural MRI data, and the best results of RMSE of 2.12 and 5.03 and the CORR of 0.57 and 0.52 were achieved on 52 AD, 148 MCI and 64 NC in estimating MMSE and ADAS-Cog scores, respectively. In (Stonnington et al. 2010), a RVR was adopted to estimate MMSE and ADAS-Cog scores from MRI data, and the best results of CORR of 0.7 and 0.57 were achieved on dataset 1 (73 AD, 91 NC), dataset 2 (113 AD, 351 MCI, 122 NC) and dataset 3 (39 AD, 92MCI, 32NC) in estimating MMSE and ADAS-Cog scores, respectively. In contrast, our proposed SM-RVR can achieve the best result of RMSE of 1.92 and the CORR of 0.80 for the MMSE score, and RMSE of 4.45 and the CORR of 0.78 for the ADAS-Cog score. These results validate the advantage of our method, compared with the conventional supervised and individual-modality-based regression methods.

Table 6 Comparison with the state-of-the-art methods

Predicting the Future Conversion of MCI Subjects

In addition to estimating the clinical scores, the proposed SM-RVR method can also be used for predicting the future conversion of MCI subjects, from the multimodal imaging and non-imaging biomarkers. Although the full investigation on this issue is beyond the main scope of the current paper, we examine the classification performance of our SM-RVR method using the same dataset with 51 AD, 52 NC, and 99 MCI (including 43 MCI converters (MCI-C) and 56 MCI non-converters (MCI-NC)). For training SM-RVR model for classification between MCI-C and MCI-NC, we adopt the 10-fold cross-validation strategy to divide the training and testing set for 99 MCI subjects, and use the training MCI subjects as unlabeled data together with all AD and NC subjects as labeled data to train a semi-supervised model. For comparison, we also report the classification results of M-RVR using the same training data as SM-RVR (i.e., training set of MCI subjects and all AD and NC subjects). However, different from SM-RVR, M-RVR trains a supervised model by treating the AD and MCI-C as one class and the NC and MCI-NC as the other class. On the dataset mentioned, our SM-RVR method achieves a classification accuracy of 69.4 % (with 72.4 % sensitivity and 67.8 % specificity), which is comparable to some recently reported results for MCI conversion prediction in the literature. In contrast, M-RVR only achieves a classification accuracy of 66.4 % (with 69.3 % sensitivity and 63.7 % specificity). Also, the M-RVR with only using the AD and NC as training set for MCI conversion prediction achieves a classification accuracy of only 58.4 % (with 64.1 % sensitivity and 54.1 % specificity). This again validates the advantage of our SM-RVR method over the conventional methods in predicting the future conversion of MCI to AD.

Clinical Implications

Current clinical practice in Alzheimer’s disease (AD) relies of brain imaging and other diagnostic tests primarily to rule out other entities, such as vascular dementia. Although CSF amyloid and tau biomarker tests are commercially available, they are seldom used due primarily to the lack of an effective disease-modifying therapy. The ADNI consortium dataset provides a robust sample for investigating and validating clinically relevant biomarkers along the spectrum from MCI to AD. The wealth of data available from volumetric MRI brain scans, fluorodeoxyglucose PET imaging, and CSF amyloid and tau markers, together with a standard clinical assessment test battery are designed to facilitate the development of more useful clinic diagnostic tests and objective therapeutic outcome measures. The superior results achieved with SM-RVR using all three modalities are not surprising, as each of the three modalities reflects complementary and clinically relevant information. CSF amyloid and tau markers may be more useful in pre-clinical diagnosis and therapeutic monitoring of specific anti-amyloid and/or anti-tau therapeutic agents. Quantitative MRI measures of brain atrophy, particularly in the hippocampal regions, are more likely to reflect episodic memory deficits that are the sine qua non of AD and the amnestic form of MCI. However, variability across MCI subjects that do not conform to the “amnestic“subtype is likely to be accounted for better by the use of the SM-RVR multiple regression technique. Brain fluorodeoxyglucose PET imaging provides a measure of synaptic functioning and may be most useful as a predictor of clinical progression, as exemplified by predicting conversion of MCI to AD. Depending on the specific application pertaining to diagnostic accuracy or therapeutic monitoring, SM-RVR methodology may be adapted to help streamline clinical therapeutic trials.

Limitations

Our proposed method is limited in the following aspects. First, our proposed regression method separately estimates different clinical variables, without considering their inherent correlations and further exploiting the class labels to aid the accurate estimation of regression variables. Second, our proposed method is based on the multimodal data, i.e., MRI, PET, and CSF, and thus requires each subject to have the corresponding modality data, which limits the size of sample set that can be used for study. Besides MRI, PET, and CSF, there also exist other modalities of data, i.e., APOE, in ADNI database. However, due to the fact that not every subject has all available multimodal data, we are not able to investigate the contribution of those modality data in the current study, although in principle the inclusion of data from more modalities could further improve the regression performance.

Conclusion

This paper proposes a novel semi-supervised multimodal regression method, namely SM-RVR, to estimate the clinical scores of subjects from both imaging and biological biomarkers, i.e., MRI, PET and CSF. Our method assumes that the clinical scores obtained from MCI subjects may be less stable than those of AD and NC subjects due to the heterogeneity of MCI, and thus MCI subjects should be used as unlabeled data in a semi-supervised regression framework, rather than as labeled data in the conventional supervised regression framework. Furthermore, a scheme for selecting the most informative MCI subjects for helping train regression model is also developed. Experimental results on 202 baseline subjects from ADNI database show that our proposed method can achieve better performance than the state-of-the-art methods in estimating clinical scores based on pattern regression.

Information Sharing Statement

The dataset used in this paper were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) which are available at http://www.adni-info.org. Source code and binary programs developed in this paper are available via our website (http://bric.unc.edu/ideagroup/). The RVM regression code is implemented using Sparse Bayesian toolbox, which is available at http://www.miketipping.com/index.php? page=rvm.