Abstract
As a neurodegenerative disorder, the Alzheimer’s disease (AD) status can be characterized by the progressive impairment of memory and other cognitive functions. Thus, it is an important topic to use neuroimaging measures to predict cognitive performance and track the progression of AD. Many existing cognitive performance prediction methods employ the regression models to associate cognitive scores to neuroimaging measures, but these methods do not take into account the interconnected structures within imaging data and those among cognitive scores. To address this problem, we propose a novel multi-task learning model for minimizing the k smallest singular values to uncover the underlying low-rank common subspace and jointly analyze all the imaging and clinical data. The effectiveness of our method is demonstrated by the clearly improved prediction performances in all empirical AD cognitive scores prediction cases.
Z. Huo and H. Huang—were supported in part by NSF IIS-1117965, IIS-1302675, IIS-1344152, DBI-1356628, and NIH AG049371. D. Shen was supported in part by NIH AG041721.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Accruing scientific evidences have demonstrated that the neuroimaging techniques, such as magnetic resonance imaging (MRI), are important for the detection of early Alzheimer’s Disease (AD) [2, 4, 7, 13]. Current American Academy of Neurology (AAN) guidelines [3] for dementia diagnosis recommend imaging to identify structural brain diseases that can cause cognitive impairment. Because AD is a neurodegenerative disorder characterized by progressive impairment of cognitive functions, it is important to diagnose the degree of brain impairment, and how much it can influence the performance of cognitive tests. As a result, many studies have focused on using regression models to predict cognitive scores and track AD progression [10, 11]. In [10], the voxel-based morphometry (VBM) features extracted from the entire brain were jointly analyzed by the relevance vector regression method to predict different clinical scores individually. However, different neuroimaging features or different cognitive scores are often interrelated. To tackle this problem, several recent studies, such as [11, 12], tried to employ the multi-task learning models to uncover the inherent structures among neuroimaging features and cognitive scores. The low-rank regularization is an effective method to extract the common subspace for multiple tasks. Although trace norm is a widely used convex relaxation of low-rank regularization [1], its performance is easily influenced by the large singular values. For example, when the largest singular values of matrix M increase, the rank of M doesn’t change but the trace norm of M increases correspondingly.
To address the above problems, in this paper, we propose a novel multi-task learning model to learn the associations between neuroimaging features and cognitive scores and uncover the low-rank common subspace among different tasks by minimizing the k smallest singular values. Our new k minimal singular values minimization regularization is a tighter relaxation than trace norm for rank minimization, such that our new multi-task learning model can have better prediction performance. We derive a new optimization algorithm to solve the proposed objective function and demonstrate the proof of its convergence. The proposed new model is applied to analyze the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort [16] data. In all empirical results, our new multi-task learning method consistently outperforms the widely used multivariate regression method, as well as different state-of-the-art multi-task learning approaches.
2 New Multi-task Learning Model
2.1 New Objective Function
In our new model, we focus on minimizing the k-smallest singular values of W and ignoring the largest singular values, such that our new regularization function is a better relaxation than trace norm. Thus, we propose to solve the following problem for multi-task learning:
Suppose there are T learning tasks, the t-th task has \(n_t\) training data points \(X_t=[x_1^t,x_2^t,...,x_{n_t}^t] \in \mathbb {R}^{d \times n_t}\). For each data \(x_i^t\), the label \(y_i^t\) is given with the label matrix \(Y_t=[y_1^t,y_2^t,...,y_{n_t}^t] \in \mathbb {R}^{c_t \times n_t}\) for each task t. \(W_t \in \mathbb {R}^{d\times c_t}\) is the projection matrix to be learned, \(W \in {R}^{d\times c}\) and \(c=\sum \limits _{t=1}^T c_t\).
It is interesting to see that when \(\gamma \) is large enough, then the k-smallest singular values of the optimal solution W to problem (1) will be zero as all the singular values of a matrix is non-negative. That is, when \(\gamma \) is large enough, it is equal to constrain the rank of W to be \(r=m-k\) in the problem (1).
2.2 Optimization Algorithm
As per the definition of \(||W||_*\) and singular value decomposition of W, it is known that:
where \(\left\| W \right\| _*\) is the sum of all the singular values of W, and the optimal solution of right term is sum of r largest singular values, F is the r left singular vectors of W and G is the r right singular vectors of W.
According to Eq. (2), the objective \(J_{_{opt}}\) in Eq. (1) is equivalent to:
When W is fixed, the problem (3) becomes:
The optimal solution F to the problem (4) is formed by r left singular vectors of W corresponding to the r largest singular values, and the optimal solution G is formed by r right singular vectors of W corresponding to the r largest singular values.
When F and G are fixed, we define:
the problem (3) becomes:
Using the reweighted method [6], we can solve problem (6) by iteratively solving the following problem:
where D is computed according to the solution \(W^*\) in the last iteration and is defined as:
We can see that each subproblem of task t is independent of each other in problem (7). Thus, if we use the least square loss function, for each task \(W_t\), the objective function could be written as:
We take derivatives of Eq. (9) with respect to \(b_t\) and \(W_t\), and set them to zero. The optimal solution to problem (9) is as follows:
We summarize the detailed algorithm to solve the objective \(J_{_{opt}}\) in Algorithm 1.
2.3 Algorithm Analysis
The Algorithm 1 will monotonically decrease the objective of the problem in Eq. (1) in each iteration. To prove it, we need the following lemma:
Lemma 1
For any positive definite matrices \(A,A_t \in R^{m\times m}\), the following inequality holds when \(0 < p \le 2\):
It is proved in [6] that Lemma 1 holds. Based on the Lemma, we have the following theorem:
Theorem 1
The Algorithm 1 will monotonically decrease the objective of the problem in Eq. (3) in each iteration till convergence.
Proof. In each iteration, at first, we fix W and compute \(\tilde{F}\) and \(\tilde{G}\). According to the solution of Eq. (4), we know:
When \(\tilde{F}\) and \(\tilde{G}\) are fixed, the problem becomes Eq. (7), by assuming that \(\tilde{W}\) is the solution in each iteration, we have:
On the other hand, according to Lemma 1, when \(p=1\), we have:
Combining (13), (14), and (15), we arrive at:
Thus the Algorithm 1 will not increase the objective function in (3) at each iteration. Note that the equalities in above questions hold only when the algorithm converges. Therefore, the Algorithm 1 monotonically decreases the objective value in each iteration till the convergence.
Because we alternatively solve F, G, and W, the Algorithm 1 will converge to the local optimum of the problem (3), which is equivalent to the proposed objective function.
3 Experimental Results and Discussions
3.1 Data Set Description
Data used in this paper were obtained from the ADNI database (adni.loni.usc.edu). One goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, we refer interested readers to visit www.adni-info.org.
The data processing steps are as follows. Each MRI T1-weighted image was first anterior commissure (AC)’s posterior commissure (PC) corrected using MIPAV2, intensity inhomogeneity corrected using the N3 algorithm [9], skull stripped [15] with manual editing, and cerebellum-removed [14]. We then used FAST [17] in the FSL package3 to segment the image into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), and further used HAMMER [8] to register the images to a common space. GM volumes obtained from 93 ROIs defined in [5], normalized by the total intracranial volume, were extracted as features. Nine cognitive scores from five independent cognitive assessments were downloaded, including three scores from RAVLT cognitive assessment; two scores from Fluency cognitive assessment (FLU); two scores from Trail making test (TRAIL). A total of 525 subjects are involved in our study, including 78 AD, 260 MCI, and 187 HC participants.
3.2 Improved Cognitive Status Prediction for Individual Assessment Tests
First, we apply the proposed method to the ADNI cohort, and separately predict each of the following three sets of cognitive scores: RAVLT, TRAILS and FLUENCY. The morphometric variables \(\{x_i\}_{i=1}^n \in \mathbb {R}^d\), and \(d=93\) in this experiment.
We compare the proposed multi-task learning method to three most related methods: multivariate regression (MRV), multi-task learning model with \(\ell _{2,1}\)-norm regularization (\(\ell _{2,1}\)) [11], and multi-task learning model with trace norm (LS_TRACE) [1], in cognitive performance prediction. For each test case, we use 5-fold cross validation and the prediction performance is assessed by the root mean square error (RMSE). All experimental results are reported in Table 1. The proposed method consistently outperforms other methods in nearly all the test cases for all the cognitive tasks.
The heat maps of parameter weights are shown in Fig. 1. Visualizing the parameter weights can help us locate the features which play important roles in the corresponding cognitive prediction tasks. In this way, there is much potential to identify the relevant imaging predictors and explain the effects of morphometric changes in relation to cognitive performance. As we can see, different coefficient values are represented in different colors in heat map. The blue polar and red polar mean a significant effect of corresponding features on cognitive score performance.
3.3 Improved Cognitive Performance Prediction for Joint Assessment Tests
To further evaluate the multi-task joint analysis power, we apply the proposed method to predict all five types of cognitive scores (RAVLT, TRAILS, FLUENCY) jointly. Such experiments will demonstrate how the interrelations among cognitive assessment tests are utilized to enhance the prediction performance.
Similar to the previous experiment, we also compare our method to three other related models. For each test case, we use 5-fold cross validation to evaluate the average performance of each algorithm. The prediction results are evaluated by RMSE and reported in Table 2. In all prediction cases, our method outperforms other methods.
4 Conclusion
In this paper, we proposed a new multi-task learning model for minimizing k smallest singular values to predict the cognitive scores for complex brain disorders. This proposed new low-rank regularization is a better approximation of rank minimization regularization problem than the standard trace norm regularization, thus our new multi-task learning method can uncover the shared common subspace efficiently and sufficiently. As a result, cognitive score prediction results are enhanced by the learned hidden structures among tasks and features. We also introduced an efficient optimization algorithm to solve our proposed objective function with rigorous theoretical analysis. Our experiments were conducted on the MRI and multiple cognitive scores data of the ADNI cohort and yield promising results: (1) Prediction performance of the proposed multi-task learning model is better than all related methods in all cases; (2) Our method can predict multiple cognitive scores at the same time and has a potential to play an important role in determining cognitive functions and characterizing AD progression.
References
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Batmanghelich, N., Taskar, B., Davatzikos, C.: A general and unifying framework for feature construction, in image-based pattern classification. In: Prince, J.L., Pham, D.L., Myers, K.J. (eds.) IPMI 2009. LNCS, vol. 5636, pp. 423–434. Springer, Heidelberg (2009)
De Leon, M., George, A., Stylopoulos, L., Smith, G., Miller, D.: Early marker for Alzheimer’s disease: the atrophic hippocampus. Lancet 334(8664), 672–673 (1989)
Hassabis, D., Maguire, E.A.: Deconstructing episodic memory with construction. Trends Cogn. Sci. 11(7), 299–306 (2007)
Kabani, N.J.: 3D anatomical atlas of the human brain. Neuroimage 7, P-0717 (1998)
Nie, F., Huang, H., Ding, C.H.: Low-rank matrix recovery via efficient schatten p-Norm minimization. In: AAAI (2012)
Rosen, H.J., Gorno-Tempini, M.L., Goldman, W., Perry, R., Schuff, N., Weiner, M., Feiwell, R., Kramer, J., Miller, B.L.: Patterns of brain atrophy in frontotemporal dementia and semantic dementia. Neurology 58(2), 198–208 (2002)
Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Trans. Med. Imaging 21(11), 1421–1439 (2002)
Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging 17(1), 87–97 (1998)
Stonnington, C.M., Chu, C., Klöppel, S., Jack Jr., C.R., Ashburner, J., Frackowiak, R.S.: Predicting clinical scores from magnetic resonance scans in Alzheimer’s disease. Neuroimage 51(4), 1405–1413 (2010)
Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L.: Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 557–562. IEEE (2011)
Wang, H., Nie, F., Huang, H., Risacher, S., Saykin, A.J., Shen, L., ADNI: joint classification and regression for identifying ad-sensitive and cognition-relevant imaging biomarkers. In: 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 115–123 (2011)
Wang, H., Nie, F., Huang, H., Risacher, S.L., Saykin, A.J., Shen, L.: ADNI: identifying disease sensitive and quantitative trait relevant biomarkers from multi-dimensional heterogeneous imaging genetics data via sparse multi-modal multi-task learning. Bioinformatics 28(12), i127–i136 (2012)
Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D., Initiative, A.D.N., et al.: Knowledge-guided robust MRI brain extraction for diverse large-scale neuroimaging studies on humans and non-human primates. PloS One 9(1), e77810 (2014)
Wang, Y., Nie, J., Yap, P.-T., Shi, F., Guo, L., Shen, D.: Robust deformable-surface-based skull-stripping for large-scale studies. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 635–642. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23626-6_78
Weiner, M.W., Aisen, P.S., Jack Jr., C.R., Jagust, W.J., Trojanowski, J.Q., Shaw, L., Saykin, A.J., Morris, J.C., Cairns, N., Beckett, L.A., et al.: The Alzheimer’s disease neuroimaging initiative: progress report and future plans. Alzheimer’s Dement. 6(3), 202–211 (2010)
Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20(1), 45–57 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Huo, Z., Shen, D., Huang, H. (2016). New Multi-task Learning Model to Predict Alzheimer’s Disease Cognitive Assessment. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9900. Springer, Cham. https://doi.org/10.1007/978-3-319-46720-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-46720-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46719-1
Online ISBN: 978-3-319-46720-7
eBook Packages: Computer ScienceComputer Science (R0)