Abstract
While functional magnetic resonance imaging (fMRI) is important for healthcare/neuroscience applications, it is challenging to classify or interpret due to its multi-dimensional structure, high dimensionality, and small number of samples available. Recent sparse multilinear regression methods based on tensor are emerging as promising solutions for fMRI. Particularly, the newly proposed tensor singular value decomposition (t-SVD) sheds light on new directions. In this work, we study t-SVD for sparse multilinear regression and propose a Sparse tubal-regularized multilinear regression (Sturm) method for fMRI. Specifically, the Sturm model performs multilinear regression with two regularization terms: a tubal tensor nuclear norm based on t-SVD and a standard \(\ell _1\) norm. An optimization algorithm under the alternating direction method of multipliers framework is derived for solving the Sturm model. We then perform experiments on four classification problems, including both resting-state fMRI for disease diagnosis and task-based fMRI for neural decoding. The results show the superior performance of Sturm in classifying fMRI using just a small number of voxels.
W. Li and J. Lou—These authors contributed equally to this work.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Brain diseases affect millions of people worldwide and impose significant challenges to healthcare systems. Functional magnetic resonance imaging (fMRI) is a key brain imaging technique for diagnosis, monitoring and treatment of brain diseases. Beyond healthcare, fMRI is also an indispensable tool in neuroscience studies [5].
Facing the challenging “large p (brain voxels) small n (samples)” problem in brain imaging, sparse learning models [9, 11] are found to be attractive on fMRI data because they can reveal the direct dependency of a response (e.g., diagnosis outcome or brain states) on a small portion of features i.e., brain voxels. Recently, tensor-based sparse multilinear regression methods are emerging as a promising direction, where tensor refers to multidimensional array. For example, a 3D fMRI volume can be seen as a 3D tensor or a third-order tensor.
Tensor-based sparse multilinear regression models relate a feature tensor with a univariate response via a coefficient tensor, generalizing Lasso-based models [4] to tensor data. Regularization that promotes sparsity and low rankness is also generalized to the coefficient tensor. For example, the regularized multilinear regression and selection (Remurs) model [12] incorporates a sparse regularization term, via an \(\ell _1\) norm, and a Tucker rank-minimization term [13], via a summation of the nuclear norms (SNN) of unfolded matrices with application to task-based fMRI data. A new Tubal Tensor Nuclear Norm (TNN) [14] has recently been proposed based on the tubal rank, which originates from the tensor singular value decomposition (t-SVD) [6]. In this work, we study sparse multilinear regression under the t-SVD framework for fMRI classification. The success of TNN was limited to unsupervised learning settings such as completion/recovery and robust PCA [2]. To our knowledge, TNN has not been studied in a supervised setting yet, such as multilinear regression for predicting a response based on a set of tensor-structural samples. Moreover, the targeted fMRI classification tasks have the challenge of small sample size (relative to the feature dimension).
Our contributions are twofold: (1) We propose a Sparse tubal-regularized multilinear regression (Sturm) method that incorporates TNN regularization and a sparsity regularization on the coefficient tensor; (2) We evaluate Sturm and related methods on both resting-state and task-based fMRI classification problems, instead of only one of them as in previous work [3].
2 Method
Notations. We use lowercase, bold lowercase, bold uppercase, calligraphic uppercase letters to denote scalar, vector, matrix, and tensor, respectively. A third-order tensor \(\mathcal {A}\in \mathbb {R}^{I_1\times I_2 \times I_3}\) is addressed by three indices \(\{i_n\}\), \(n=1, 2, 3\). Each \(i_n\) usually addresses the nth mode of \(\mathcal {A}\). For example, in a 3D fMRI volume, \(I_1, I_2\) and \(I_3\) could indicate the sagittal, coronal and axial dimension, respectively.
2.1 Tubal Rank and Tubal Tensor Nuclear Norm (TNN)
Tubal rank is derived from the t-SVD as illustrated in Fig. 1, where \(\mathcal {U}\in \mathbb {R}^{I_1\times I_1\times I_3}\), \(\mathcal {V}\in \mathbb {R}^{I_2\times I_2\times I_3}\) are orthogonal tensors, and \(\mathcal {S}\in \mathbb {R}^{I_1\times I_2\times I_3}\) is an f-diagonal tensor. The number of nonzero singular tubes (along the diagonal direction) of \(\mathcal {S}\) is defined as the tubal rank.
t-SVD can be computed efficiently via the discrete Fourier transfer (DFT). Denote the Fourier transformed tensor \(\mathcal {A}\) as \(\mathcal {A}_\mathscr {F}\), \(\mathcal {A}_\mathscr {F}= \mathtt{fft}(\mathcal {A},[],3)\). As a convex relaxation for the tubal rank, the TNN of \(\mathcal {A}\in \mathbb {R}^{I_1\times I_2\times I_3}\) is defined as \( \Vert \mathcal {A}\Vert _{TNN} = \frac{1}{I_3}\sum _{i_3=1}^{I_3} \Vert \mathbf {A}_\mathscr {F}^{(i_3)}\Vert _*,\) where \(\Vert \cdot \Vert _* \) denotes the matrix nuclear norm. More about t-SVD and TNN can be found in [14].
2.2 The Sturm Model
Regularized Multilinear Regression Model. This approach trains a model from M pairs of feature tensor and associated response label, \((\mathcal {X}_m \in \mathbb {R}^{I_1\times I_2\times I_3}, y_m)\) with \(m = 1,...,M\), to relate them via a coefficient tensor \(\mathcal {W}\in \mathbb {R}^{I_1\times I_2\times I_3}\) as
where \(L(\cdot )\) is a loss function, \(\varOmega (\cdot )\) is a regularization term, \(\lambda \) is a balancing hyperparameter, and \(\langle \mathcal {X},\mathcal {W}\rangle \) denotes the inner product (a.k.a. the scalar product) of two tensors of the same size:
The State-of-the-Art Model. The Remurs [12] model has been successfully applied to task-based fMRI data. It uses a conventional least square loss function and assumes \(\mathcal {W}\) to be both sparse and low rank. The sparsity of \(\mathcal {W}\) is regularized by an \(\ell _1\) norm and the low rank by an SNN norm. However, the SNN requires unfolding \(\mathcal {W}\) into matrices, susceptible to losing some higher-order structural information. Moreover, it has been pointed out in [10] that SNN is not a tight convex relaxation of its target rank.
A New Model. The limitation of SNN motivates us to propose a Sparse tubal-regularized multilinear regression (Sturm) model which replaces SNN in Remurs with TNN. This leads to the following objective function
where \(\tau \) and \(\gamma \) are hyperparameters, and \(\Vert \mathcal {W}\Vert _{1}\) is the \(\ell _1\) norm of tensor \(\mathcal {W}\), defined as \(\Vert \mathcal {W}\Vert _1 = \sum _{i_1}\sum _{i_2}\sum _{i_3} \left| \mathcal {W}(i_1,i_2,i_3)\right| ,\) which is equivalent to the \(\ell _1\) norm of its vectorized representation \(\mathbf {w}\). Here, the TNN regularization term \(\Vert \mathcal {W}\Vert _{TNN}\) enforces low tubal rank in \(\mathcal {W}\).
Optimization Algorithm for Sturm. ADMM [1] is a standard solver for Problem (3). Thus, we derive an ADMM algorithm to optimize the Sturm objective function. We begin with introducing two auxiliary variables, \(\mathcal {A}\) and \(\mathcal {B}\) to disentangle the TNN and \(\ell _1\)-norm regularization:
Then, we introduce two Lagrangian dual variables \(\mathcal {P}\) (for \(\mathcal {A}\)) and \(\mathcal {Q}\) (for \(\mathcal {B}\)). With a Lagrangian constant \(\rho \), the augmented Lagrangian becomes,
where \(\Vert \cdot \Vert _F\) is the Frobenius norm defined as \(\Vert \mathcal {T}\Vert _F = \sqrt{\langle \mathcal {T}, \mathcal {T}\rangle }\) using Eq. (2). We further introduce two scaled dual variables \(\mathcal {P'} = \frac{1}{\rho }\mathcal {P}\) and \(\mathcal {Q'} = \frac{1}{\rho }\mathcal {Q}\) only for notational convenience. Next, we derive the update from iteration k to \(k+1\) by minimizing one variable with all the other variables fixed.
Updating \(\mathcal {A}^{k+1}\):
This can be rewritten as a linear-quadratic objective function by vectorizing all the tensors. Specifically, let \(\mathbf {a}=\mathtt{vec}(\mathcal {A})\), \(\mathbf {w}^k=\mathtt{vec}(\mathcal {W}^k)\), \(\mathbf {p'}^k=\mathtt{vec}(\mathcal {P'} ^k)\), \(\mathbf {y}=[y_1 \cdots y_M]^\top \), \(\mathbf {x}_m=\mathtt{vec}(\mathcal {X}_m)\), and \(\mathbf {X}=[\mathbf {x}_1 \cdots \mathbf {x}_M]^\top \). Then we get an equivalent objective function with the following solution:
where \(\mathbf {I}\) is an identity matrix. \(\mathcal {A}^{k+1}\) is obtained by folding (reshaping) \(\mathbf {a}^{k+1}\) into a third-order tensor, denoted as \(\mathcal {A}^{k+1} = \mathtt{tensor}_{3}(\mathbf {a}^{k+1})\).
Updating \(\mathcal {B}^{k+1}\):
The proximal operator for the TNN at tensor \(\mathcal {T}\) with parameter \(\mu \) is denoted by \(\mathtt{prox}_{\mu \Vert \cdot \Vert _{TNN}} (\mathcal {T})\) and defined as
\( \mathtt{prox}_{\mu \Vert \cdot \Vert _{TNN}} (\mathcal {T}) := \arg \min _{\mathcal {W}} \mu \Vert \mathcal {W}\Vert _{TNN} + \frac{1}{2} \Vert \mathcal {W}-\mathcal {T}\Vert _F^2\). More details of the TNN proximal operator can be found in [14].
Updating \(\mathcal {W}^{k+1}\):
It can be solved by calling the proximal operator of the \(\ell _1\) norm with parameter \(\frac{\gamma }{2\rho }\), which is simply the element-wise soft-thresholding, i.e., \(\mathtt{prox}_{\mu \Vert \cdot \Vert _1}(\mathcal {T}) = sign(\mathcal {T})(|\mathcal {T}|-\mu )_{+}\).
Updating \(\mathcal {P}^{k+1}\) and \(\mathcal {Q}^{k+1}\):
Optimization Algorithm Analysis. Sturm is an instance of the ADMM algorithm applied to linear constraint convex optimization problem, which is guaranteed to converge with a \(\frac{1}{\epsilon }\) convergence rate [1].
3 Experiments and Discussion
3.1 Classification Problems and Datasets
Resting-State fMRI for Disease Diagnosis. We use two freely available datasets. Rest 1 – ABIDE\( _{NYU \& UM}\): the two largest subsets NYU and UM from the Autism Brain Imaging Data Exchange (ABIDE)Footnote 1, which consists of 101 patients with autism spectrum disorder (ASD) and 131 healthy control subjects. Rest 2 – ADHD-200\(_{NYU}\): the NYU subset from the Attention Deficit Hyperactivity Disorder (ADHD) 200 datasetFootnote 2 with 118 ADHD patients and 98 healthy controls. The 4D raw fMRI data is reduced to 3D by either taking the average or the amplitude of low frequency fluctuation of voxel values along the time dimension. We perform experiments on both and report the best results.
Task-Based fMRI for Neural Decoding. We consider four datasets from the OpenfMRIFootnote 3 in two binary classification problems. Task 1 – Balloon vs Mixed Gamble, two gamble-related datasets with 64 subjects in total; and Task 2 – Simon vs Flanker, two recognition and response related tasks with overall 94 subjects. The OpenfMRI data is processed with a standard template following [8] to obtain the 3D statistical parametric maps (SPMs) for each brain condition.
3.2 Algorithms and Evaluation Settings
Algorithms. Sturm and Sturm + SVM (support vector machine) are compared against the following four algorithms and three additional algorithms combining with SVM.
-
SVM: a linear SVM is chosen for both speed and accuracy consideration.
-
Lasso [4]: a linear regression method with the \(\ell _1\) norm regularization.
-
Elastic Net (ENet) [15]: a linear regression method with \(\ell _1\) and \(\ell _2\) norm.
-
Remurs [12]: a multilinear regression model with \(\ell _1\) norm and Tucker rank-based SNN regularization.
SVM, Lasso, and ENet take vectorized fMRI data as input while Remurs directly takes 3D fMRI tensors as input. In addition, Lasso, ENet, Remurs, and Sturm can also be used for feature selection. Hence, we can use the selected voxels from each of the above algorithm as input to SVM for classification i.e., Lasso + SVM, ENet + SVM, Remurs + SVM and Sturm + SVM.
Model Hyper-parameter Tuning. For Sturm, we follow the Remurs default setting [12] to set \(\rho \) to 1 and use the same set \(\{10^{-3}, 5\times 10^{-3}, 10^{-2}, \dots , 5\times 10^2, 10^3\}\) for \(\tau \) and \(\gamma \), additionally scaling the first term in Eq. (3) by a factor \(\alpha = \sqrt{(\max (I_1, I_2)\times I_3)}\) to better balance the scales of the loss function and regularization terms [7]. Ten-fold cross validation is applied for tuning hyper-parameters in all the algorithms.
Image Resizing. To improve computational efficiency and reduce the small sample size problem (and overfitting), the input 3D tensors are further re-sized into three different sizes with a factor \(\beta \), choosing from \(\{0.3, 0.5, 0.7\}\). The best accuracy from all three sizes is reported for each algorithm in this paper.
Feature Selection. In Lasso + SVM, ENet + SVM, Remurs + SVM, and Sturm + SVM, we rank the selected features by their associated absolute values of \(\mathcal {W}\) in the descending order and feed the top \(\eta \%\) of the features to SVM. We study five values of \(\eta \): \(\{1, 5, 10, 50, 100\}\) and report the best accuracy for each algorithm.
Evaluation Metric and Method. The classification accuracy is our primary evaluation metric, and we also examine the sparsity of the obtained solutions for all algorithms except SVM. The sparsity is calculated as the ratio of the number of zeros in the output coefficient tensor \(\mathcal {W}\) to its size \(I_1\times I_2 \times I_3\). In general, higher sparsity implies better interpretability [4].
3.3 Result Discussion and Visualization
Classification Accuracy. Table 1 shows the classification accuracy for all algorithms. Over the two resting-state problems, Sturm + SVM has the highest accuracy of 65.45%, and Lasso is the second-best (63.00%). Whereas, on these two task-based problems, Sturm has outperformed all other algorithms in accuracy, with 88.00% accuracy. Lasso is again the second-best in accuracy (86.30%). One interesting observation is that on task-based classification problems, Sturm + SVM has significant drop in accuracy compared with Sturm alone, and Lasso + SVM, ENet + SVM and Remurs + SVM all have lower accuracy compared to without SVM. Overall, Sturm still achieves the highest accuracy.
Model Sparsity. Table 2 presents the respective sparsity values except SVM, which uses all features so the sparsity is zero. Over the two resting-state classification problems, Sturm + SVM and Sturm are the top two algorithms with sparsity of 0.93 and 0.86, respectively. Noticeably, Lasso & ENet fail to select a sparse solution via cross validation on Rest 2. Over the two task-based problems, both ENet and ENet + SVM have the best sparsity (0.96). However, both of them have much lower accuracy than Sturm and Remurs. Over all the problems, Sturm + SVM obtains the most sparse model with the overall sparsity of 0.82, meanwhile, Sturm and Remurs have almost identical sparsity of 0.76.
Weight Map Visualization. Take Task Simon vs Flanker as an example. The selected voxels on two brain slices from the top 5% (4514 voxels, ranked by the absolute weights) of the full-brain voxels are visualized in Fig. 2 using the best performing parameters in cross-validation of Lasso, ENet, Remurs and Sturm. Figure 2 clearly shows that Sturm and Remurs select more spatially connected voxels than Lasso or ENet does, resulting in more robust and easier interpreted models for further analysis. Both Sturm and Remurs choose voxels from highly consistent regions, while Sturm produces weight maps with slightly higher visual contrast than Remurs does.
4 Conclusion
The proposed multilinear regression model (Sturm) performs regression with regularization on the tubal tensor nuclear norm (TNN), demonstrates the superior overall performance of Sturm (and Sturm + SVM, in some cases) over other methods in terms of accuracy, sparsity and weight maps. Thus, it is promising to use TNN and tubal rank regularization in a supervised setting for fMRI.
Notes
- 1.
http://fcon_1000.projects.nitrc.org/indi/abide.
- 2.
- 3.
References
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)
Eickenberg, M., Dohmatob, E., Thirion, B., Varoquaux, G.: Grouping total variation and sparsity: statistical learning with segmenting penalties. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 685–693. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_84
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Huettel, S.A., Song, A.W., McCarthy, G.: Functional Magnetic Resonance Imaging, vol. 1. Sinauer Associates, Sunderland (2004)
Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)
Lu, C., Feng, J., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. PAMI (2019)
Poldrack, R.A., Barch, D.M., Mitchell, J., et al.: Toward open sharing of task-based fMRI data: the OpenfMRI project. Front. Neuroinf. 7, 12 (2013)
Rao, N., Cox, C., Nowak, R., et al.: Sparse overlapping sets lasso for multitask learning and its application to fMRI analysis. In: NeurIPS, pp. 2202–2210 (2013)
Romera-Paredes, B., Pontil, M.: A new convex relaxation for tensor completion. In: NeurIPS, pp. 2967–2975 (2013)
Ryali, S., Supekar, K., Abrams, D.A., Menon, V.: Sparse logistic regression for whole-brain classification of fmri data. NeuroImage 51(2), 752–764 (2010)
Song, X., Lu, H.: Multilinear regression for embedded feature selection with application to fMRI analysis. In: AAAI Conference on Artificial Intelligence (2017)
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Zhang, Z., Aeron, S.: Exact tensor completion using t-SVD. IEEE Trans. Sig. Process. 65(6), 1511–1526 (2017)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Acknowledgement
This work was supported by the grant from the UK Engineering and Physical Sciences Research Council (EP/R014507/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, W., Lou, J., Zhou, S., Lu, H. (2019). Sturm: Sparse Tubal-Regularized Multilinear Regression for fMRI. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-32692-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32691-3
Online ISBN: 978-3-030-32692-0
eBook Packages: Computer ScienceComputer Science (R0)