Keywords

1 Introduction

Alzheimer’s disease (AD) is a common form of dementia affecting millions of elderly people above the age of 65 worldwide. Before AD, ailments such as (MCI) serves as an intermediary phase between normal cognitive controls (NC) and AD. Furthermore, this MCI phase has a high conversion rate to AD. As a result, there is a need for the development of a sensitive, precise, and specific atrophy biomarkers for early detection of AD progression [1]. These new methods are needed to help researchers develop new treatments for Alzheimer’s as discussed by Hua et al. [2]. Methods for early detection may further differ by the type of imaging biomarkers that can be applied [3,4,5]. For example, neuroimaging methods such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI) and structural magnetic resonance imaging (MRI) are useful in evaluation of anatomical degradation caused by the disease [6,7,8]. Overtime, structural MRI of the brain has progressively become more employed in identifying structural changes in common aging diseases like Alzheimer’s [9]. Structural brain MRI methods have the ability to utilize biomarkers that are presented in the image. These biomarkers are able to illustrate the structural differences for a healthy and diseased individuals. It is important to note that the methods may vary depending on the nature of the employed imaging biomarkers [10]. In spite of this, due to the ease of availability, non-persistent nature, and a high quality of MR images, they are the most suitable for differentiating changes in the brain anatomy due to disease development and progression.

As a result of the ubiquitous use of MRI in research and medicine, simultaneous advances in neuro-informatics have led to the materialization of many free and commercial image analysis software packages for the last 15 years. This includes but is not restricted to SPM, FSL, FreeSurfer, BrainVisa, Minboggle, NeuroQuant and NeurQlab. Premature diagnosis of AD by structural MRI studies is a challenging task because of its difficulty in quantifying patterns seen in the structural changes during early phases of AD or clinically normal phases [11]. Patients at the early stages of AD are classified as MCI, but not all MCI patients convert to AD. An analysis of research and clinical reports show that 5–10% of MCI patients convert to AD per year [12]. Voxel based (VBM) morphometry from high-resolution T1-weighted brain MRI data has been employed for diagnosis. Furthermore imaging biomarkers were obtained from the processed images such as grey matter concentration maps which are registered to a reference location for facilitating voxel by voxel comparisons across subjects [13]. In this work, we focus on the volumetric measurements of various brain structures as they have an impact on dementia diagnosis. Specifically, MCI is known to be effected by volume loss of brain structures like the hippocampus, MTI, the entorhinal cortex, and the total volume of the brain and is therefore exploited for classification.

Kloppel et al. applied support vector machine (SVM) to classify grey matter segments in T1-weighted MR scans obtained from diagnosed AD patients and the NCs obtained from two centers with dissimilar scanning equipment in order to generalize across different medical centers [14]. Magnin et al. proposed a new classification method of whole-brain (1.5-T) MRI to discriminate AD patients from NC subjects based on SVMs [15]. Here brain is divided into five regions of interest by using a previously developed anatomically labelled template for the brain and created a mask to exclude voxels of the skull. Vemuri et al. [16] developed a tool for Alzheimer’s diagnosis through structural classification of MRI using SVM. As the dimension of these brain structures is collinear, it is essential to know which of them is more likely linked to severity of illness; the amount of atrophy in the other explains further variation in overall symptom severity. The studies in this field typically evaluate the diagnostic accuracy of AD and MCI patients with healthy control subjects. This study proposes volumetric measurement of hippocampus, medial temporal lobe, ventricles, amygdala, whole brain volume, cortical grey matter and entorhinal cortex and fusiform structures used as MRI biomarkers to predict different forms of dementia including the AD and the MCI. The MRI database scan for the proposed work has been taken from the AD Neuroimaging Initiative (ADNI) [17]. FreeSurfer software is employed to obtain hippocampal, MTL, and whole brain volumes, as well as ventricles, amygdala and cortical grey matter by cortical and sub-cortical segmentation. Furthermore, the SVM classification from LibSVM package is utilized for multi-class classification of AD, MCI and NC.

The organization of this paper is as follows. In Sect. 2, possible volume biomarkers of AD are discussed. In the first part of Sect. 3, the dataset and the FreeSurfer tool are briefly presented, followed by explaining the inner workings of SVM for Alzheimer’s classifications. The whole process of classification is given in Fig. 1. Section 4 is devoted for discussing the performance of the presented method. Finally, in Sect. 5 the conclusion and the future work are communicated.

Fig. 1.
figure 1

Flowchart

2 Volume Biomarkers of AD

Manual volumetric measurements of brain structures is regarded as “the gold standard” for detecting symptoms of AD. However, it is time consuming and has an operator bias. In comparison, automatic measuring methods such as voxel-based morphometry (VBM) are fast and are extensively employed in the field [23,24,25]. However, this method is not to define every gyrus in the brain and is criticized by some to have confounding issues [26]. Lies et al. has addressed some of these issues where it is found that a VBM method is measuring the same effects as “the gold standard” concerning to the subcortical brain structures [27]. Overall, major structures in the brain like hippocampus, medical temporal lobe, ventricles, amygdala, cortical grey matter, entorhinal cortex and the whole brain volume are investigated for indications of atrophy that lead to AD.

2.1 Hippocampus

The hippocampus creates the majority of the temporal lobe and is commonly used for AD diagnosis. Moreover, hippocampal atrophy is a well-known cause of dementia [9]. Specifically, hippocampal atrophy differentiates the three main disease stages of AD, MCI and NC [21]. It is also speculated that a low hippocampus volume can be utilized as a new diagnostic criterion for MCI patients with high risks of AD conversion [11].

2.2 Medial Temporal Lobe

The medial temporal lobe (MTL) region contains structures that are key in long-term memory. As a result, a structural MRI of the MTL’s atrophy is an effective indicator for the initial diagnosis of AD. Visser et al. reported these results in 1999 among 45 patients in their study [17].

2.3 Ventricles

Ventricles are cavities in the cerebral hemispheres filled with cerebrospinal fluid. Furthermore, their volume variations indicate the existence of AD. These cavities are found to expand in size steadily in AD patients [20]. In particular, Apostolova et al. has reported that the use of cerebral ventricular volume for measurement of AD development. They claimed that the hemispheric atrophy rate calculated by ventricular enlargement correlates strongly with changes on cognitive tests and are able to capture significant variations among levels the stages of Alzheimer’s [18].

2.4 Amygdala

Amygdala is a primary limbic structure anatomically interconnected with the neocortex. In particular, the amygdala serves as a structure for how emotions are processed. In cases of AD, neural lost and alterations in glial cell population have been reported. In support, Poulin et al. reported the magnitude of amygdala atrophy is considerable in AD stages [19].

2.5 Whole Brain

Volumetric MRI studies have found relationships between increasing age and decreasing brain volumes. In particular, there is an age-correlated decrease in hippocampal, temporal, frontal lobe structure volumes, and an increase in cerebrospinal spaces [20]. Moreover, there are more sensitive predictors of AD and MCI are achievable by exploiting the whole brain’s atrophy rate along with the hippocampal volume [21].

2.6 Cortical Grey Matter

MRI measurements of cortical grey matter and abnormal white matter are independently connected with dementia severity. Both biomarkers have their own contributions to the performance in MCI domains as well. For example, quantitative MRI provides a strong conformation that cortical grey matter volume are related to atrophy and abnormal white matter volume are separately related to the dementia severity in AD subjects [22].

2.7 Entorhinal Cortex

Entorhinal cortex is a key pre-processor that stimulates the nearby hippocampus. It serves as an area for memory and navigation. Examinations have confirmed this assumption; also, few observations illustrate that entorhinal cortex is the primary part which is affected in MCI cases even earlier than hippocampus.

3 Materials and Methods

3.1 Data

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). The ADNI was collectively launched by six non-profit organizations in 2003: the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and available at adni.loni.usc.edu. It aims to assess whether structural MRI, positron emission tomography (PET), biomarkers, as well as clinical and neuropsychological assessments can be collectively measure the progression of MCI and early AD. The dataset is divided into categories of AD, MCI and NC, where MCI consists of EMCI and LMCI as shown in Table 1.

Table 1. Overview of the MRI dataset

3.2 FreeSurfer Processing

FreeSurfer is one of the most widely used software today for volumetric analysis of the brain. It is indeed a set of tools for cortical analysis and visualization and sub-cortical segmentation of MRI data [28]. Accurate and reliable segmentation is a necessity for volumetric analysis of dementia disease. Here, sub-cortical and cortical volumetric measurements were computed by FreeSurfer (version 5.3.0) using atlas based labelling of region of interest (ROI) [29]. Statistical output files generated during FreeSurfer processing stream was used to obtain hippocampus volume and intra-cranial volume (ICV). The volumes of medial temporal lobe, ventricles, amygdala, CGM, entorhinal cortex, fusiform, and the whole brain was computed using anatomical ROI segmentation analysis of their given file: aparc.a2009s+aseg.mgz. The volume of each structure is found by counting the voxels of each of these coloured and labelled structure using an .mgz image that FreeSurfer outputs by using MATLAB. Each volume calculated was then normalized by dividing them with the intra-cranial volume (ICV). The ICV was found from surfer.nmr.mgh.harvard.edu with three aseg.stat files available at 7 head-sized corrections to reduce inter-individual variation. FreeSurfer processing is computationally expensive and takes several hours to process a single image. Therefore, in order to reduce computational time, eight images are processed in parallel using GNU Parallel on an 8 core machine.

Support Vector Machine. Support vector machine is a machine learning method that classifies binary classes by finding a class boundary. This boundary, the hyper plane, is used to find the maximum margin in the given training data. The training data samples along the hyper planes near the class boundary are called support vectors and the margin is the distance between the support vectors and the class boundary hyperplanes. The SVM classifier is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between assets of objects having different class memberships. Furthermore, a classification task usually involves training and testing data, which consists of some data instances. Where each instance in the training set contains one target value (class labels) and several attributes (features). SVMs have an advantage that its objective function is convex; however, it can only guarantee to converge to a local minimum. Moreover, it is fundamentally a two-class classifier. One commonly used approach to tackle problems involving more than two classes is the one-versus-the-rest approach and is as followed:

Given a training data set with labels \(\{ (x_1,y_1),...(x_n,y_n) \}\) where \(x_i \in R^n\) and \(y_i \in \{+1,-1\}\) and a non-linear map \(\phi ()\), that maps to a higher dimensional space, \(R^n\) \(R^H\) the SVM technique solves:

$$\begin{aligned} \min _{\omega ,\xi _i,b} \{ \frac{1}{2}\Vert \omega \Vert ^2 + C\sum \xi _i \} \end{aligned}$$
(1)

Subject to the constraints:

$$\begin{aligned} y_i(\phi ^T(x_i)w+b)\ge 1-\xi _i, i=1,2...n \end{aligned}$$
(2)
$$\begin{aligned} \xi _i \ge 0, i=1,2...n \end{aligned}$$
(3)

specifically w and b define linear classifiers in a feature space. According to Cover’s theorem, a non-linear mapping function \(\phi \) is performed allowing transformed samples to be more likely linearly separable [30]. A regularizer parameter C allows control over penalty assignment to errors model. Slack variable \(\xi \) are introduced to account for non-separable data involved with permitted errors

Owing to the higher dimensionality of vector variable w, the primal function in Eq. (1) is solved by its Lagrangian dual problem which consists of maximizing:

$$\begin{aligned} L_d=\sum _i \alpha _i - \frac{1}{2} \sum _{i,j} \alpha _i \alpha _j y_i y_j \phi (x_i) \phi (x_j) \end{aligned}$$
(4)

subject to constraints

$$\begin{aligned} \sum _i \alpha _i y_i=0, i=1,2...n \end{aligned}$$
(5)
$$\begin{aligned} C\ge \alpha _i \ge 0, i=1,2...n \end{aligned}$$
(6)

where \(\alpha _i\) are Lagrange multipliers corresponding to Eq. (2). It can be noted that all \(\phi \) mappings used in the SVM learning occur in the form of inner products. Furthermore, Boster et al. proposed a way to model more complicated relationships by replacing the inner product with a kernel function (such as a Gaussian radial basis function, polynomial kernel or a linear kernel) [31]. This allows us to define a kernel function K where the inner products in the original space (\(x_i,x_j\)) replaced with inner products in the transformed space [\(\phi (x_i).\phi (x_j)\)]:

$$\begin{aligned} K(x_i,x_j)=\phi (x_i).\phi (x_j) \end{aligned}$$
(7)

This kind of kernel function allows us to simplify the solution of the dual problem considerably. This is because it avoids the computation of the inner products in the transformed space \([\phi (x_i ).\phi (x_j )]\). Though \(\phi \) mapping can be explicitly expressed for a linear or polynomial kernel, there is no explicit form of \(\phi \) mapping corresponding to the Gaussian kernel. Moreover, it can be demonstrated that the expansion is an infinite-dimensional functional [32]. Mercer’s theorem avoids to explicitly calculate \(\phi \) in these cases, and then, by introducing (7) into (4), the dual problem can be finally stated as [33]:

$$\begin{aligned} L_d=\sum _i \alpha _i - \frac{1}{2} \sum _{i,j} \alpha _i \alpha _j y_i y_j K(x_i,x_j) \end{aligned}$$
(8)

After the dual problem is solved, \(w=\sum _{i=1}^n \alpha _i y_i \phi (x_i)\) and express the final result as a decision f(x). Where any test data x is in the original (lower) dimensional feature space:

$$\begin{aligned} f(x)= sgn((\sum _{i=1}^n \alpha _i y_i K(x_i,x_j) +b)) \end{aligned}$$
(9)

Furthermore, b can be easily computed from the \(\alpha _i\) that are neither zero nor C.

The shape of the discriminant function depends on the kind of kernel functions adopted. A common kernel type that fulfills Mercer’s condition is the Gaussian radial basis function where \(\gamma \) controls the shape of the peaks and the data points are transformed to a higher dimension:

$$\begin{aligned} K(x_i,x_j)=exp(-\gamma \Vert x_i - x_j\Vert ^2), \gamma >0 \end{aligned}$$
(10)

where \(\gamma \) is a free parameter inversely proportional to the width of the Gaussian kernel.

A small \(\gamma \) means a Gaussian with a large variance resulting a stronger influence of \(x_j\). In other words, if \(x_j\) is a support vector, a small \(\gamma \) implies the class of this support vector will have influence that has a high bias on deciding the class of the vector \(x_i\) even if the distance between them is large. If \(\gamma \) is large, then variance is small implying that the support vector does not have a wide-spread influence (a low bias). A low bias is utilized because the cost of misclassification is penalized heavily. However, a large \(\gamma \) leads to a high bias and low variance models and vice versa.

The FreeSurfer tool is used to take volume of different brain regions such as medial temporal lobe, ventricles, amygdala, cortical grey matter (CGM), entorhinal cortex, and fusiform in each subject. In the training data, each row is a sample, and the columns consists the above stated feature and labels for each sample. For example, hippocampus training data for AD vs. NC classification consists 400 rows and each row represents a sample/subject; one column consists the feature for each sample; and one more column with labels: here +1 for AD and \(-1\) for NC. All training data is prepared in a similar manner for all the aforementioned brain regions.

The data is scaled before SVM is applied [34], The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Another advantage is to avoid numerical difficulties during the calculation. In order to develop an SVM, penalization parameter C; and kernel parameter \(\gamma \) must be tuned. The best C and \(\gamma \) hyper-parameters are found using Grid-Search. Grid search is when given a set of models (which differ from each other in their parameter values, which lie on a grid), train each of the models and evaluate it using 5 - fold cross-validation. Then select the one that performed best. The best C value is 512 and \(\gamma \) is 0.03125. Finally, from 700 subjects’ data, 75% training and 25% testing data are taken randomly, and used for training and then to evaluate the model’s performance respectively. Here we have implemented SVM using the libSVM [35] software package.

4 Results and Discussions

The simulated results presented are obtained using an 8 Core machine with 8 Giga Bytes of random access memory and 3 Mega Bytes of cache.

The area under the curve (AUC) of a two-class classification of combinations of the prodromal stages of dementia are shown in Table 2. AUC analysis, a commonly chosen metric, is chosen to compare the performance of classification models. The predominate reason for using AUC as an alternative to accuracy is that it is not as sensitive to differences between the class distribution within the training and test samples [36, 37]. To be precise, an AUC driven analysis helps in deciding a correct model when one may have been trained on a skewed data set.

Table 2. AUC of different combinations of the stages of dementia using SVM

From Table 2, features from the hippocampus are shown to act as better discriminators for most stages of dementia except for LMCI/AD and EMCI/NC. This is in support of the argument that, the hippocampus acts as a sensitive biomarker for earlier stages of dementia. The second highest performing biomarker is utilizing the medial temporal lobe (MTL). Though MTL as a biomarker does not perform as the best discriminator for any individual combination, it performs the best on average. Ventricles and entorhinal cortex structures are shown to be below average discriminators, as they do not even discriminate one combination of dementia stage. Moreover, despite combining all the biomarkers, it does not perform as the best discriminators overall and only excels at EMCI/NC classification. The CGM biomarker performs well for AD/MCI and LMCI/AD. The whole brain performs well for AD/LMCI and EMCI/AD, and EMCI/NC. The Fusiform performs best for LMCI from NC. The combined features perform well for MCI, EMCI discrimination from NC. The performance curve for AD/MCI, AD/NC and MCI/NC using the hippocampus features are shown in Figs. 2, 3, 4, 5, 6 and 7.

Fig. 2.
figure 2

ROC curve plotted for Hippocampus features of AD vs. NC

Fig. 3.
figure 3

ROC curve plotted for Hippocampus features of AD vs. MCI

Fig. 4.
figure 4

ROC curve plotted for Hippocampus features of MCI vs. NC

Fig. 5.
figure 5

ROC curve plotted for Combined features of AD vs. NC

Fig. 6.
figure 6

ROC curves plotted for Combined features of AD vs. MCI

Fig. 7.
figure 7

ROC curve plotted for Combined features of MCI vs. NC

5 Conclusions and Future Work

In this study we examined the accuracy and reliability of multi class classification based on ROC using volumetric measurements of different brain structures for an accurate diagnosis of dementia stages. Hippocampal volume measurements are the best discriminate for transitions of: AD from NC, AD from MCI, and NC from MCI. The results obtained are satisfactory and are based on a database of hippocampus features. This database consisted of: 400 images for AD vs. NC, 500 images for AD vs. MCI, and 500 images for NC vs. MCI. Moreover, we were able to achieve an AUC value 95.75%, 79.13% and 64.09% respectively. For future work, will be on the use of raw data to classify stages of dementia using a deep learning approach such as a convolutional neural network. Furthermore, we would like to explore the performance of utilizing combined features of hippocampus, CGM, and volume of the entire brain and how they complement each other on the several stages of dementia classification.