Introduction

Alzheimer’s disease (AD) is the most common neurodegenerative disorder in the world, characterized by the progressive impairment of cognitive functions and ultimately resulting in irreversible loss of neurons. It is one of the leading causes of dementia affecting the late-life of humans. Mild cognitive impairment (MCI) is considered as the prodromal stage and clinical precursor of AD when suffered from memory loss. It can be further divided into two types: stable MCI (sMCI) and progressive MCI (pMCI). As the name suggests, pMCI indicates that the subjects have a greater potential for further deterioration and conversion into AD, while sMCI subjects show relatively stable symptoms (Minati et al. 2009; Prince et al. 2015). Currently, there is no effective way found to cure AD. However, some treatments can be taken to delay the progression of AD, which has made the early diagnosis of AD especially important and meaningful. Magnetic resonance imaging (MRI) is a non-invasive and powerful imaging tool to obtain three-dimensional images of the human brain (Herrup 2011; Jacka et al. 2011). MRI of different modalities can characterize the structural and functional brain changes related to various brain development abnormalities, including AD and its progression. Specifically, structural MRI is widely applied for computer-aided diagnosis of AD and MCI, due to its sensitivity to aging and atrophy of human brain.

The fast development of computer vision and machine learning methods has motivated their applications on the computer-aided-diagnosis of AD with MR images (Liu et al. 2013a, b; Suk et al. 2015; Zhang et al. 2011). The early diagnosis of dementia through MR image computing and analyisis can be performed on the whole brain or regions of interest (ROIs). Compared with the whole brain, ROIs can provide morphological measurements of brain regions for AD diagnosis (Herrup 2011; Suk et al. 2015; Zhang et al. 2011). Suk et al. (2015) proposed to build a deep-stacked auto-encoder to combine the latent information with the original features of ROIs, achieving high diagnostic accuracy of AD classification. Liu et al. (2015) proposed a novel diagnostic framework with the deep-stacked autoencoder to learn high-level features of ROIs and with a zero-masking strategy for fusion of features from multiple image modalities.

Among all ROIs in the brain, the hippocampus is considered to be the earliest region that suffered damage from dementia and is strongly correlated to AD. Many studies have extracted features from the hippocampus and its affiliated areas for AD diagnosis based on sMRI (Chupin et al. 2010; Cui and Liu 2018; Ho et al. 2011; Jyrki et al. 2011; Shen et al. 2012). Since hippocampal atrophy is often found in the early stage of AD, the shape and volume analysis of the bilateral hippocampi based on MRIs are usually investigated in most of the existing methods. Chupin et al. (2010) proposed a fully automatic framework to segment the hippocampus based on probabilistic and anatomical priors, and the volumes of the segmented hippocampus were proved to be significantly associated with AD. However, the volume calculation only represents the global variation of the hippocampus, which is limited by individual differences of hippocampus volumes on different subjects. Therefore, shape analysis was then proposed to focus on the morphological differences and changes of the hippocampus, further improving the efficiency of AD diagnosis (Beg et al. 2013; Emilie et al. 2009; Lindberg et al. 2012). Local atrophy of the hippocampus, which is relevant to AD diagnosis, can be captured more precisely by shape analysis methods. Lindberg et al. (2012) proposed a shape analysis method to map the hippocampus segmentation mask into the spherical harmonic shape description (SPHARM-PDM) and found that most atrophic deformation was mainly distributed in CA1 and subiculum areas in AD. Compared with the volume analysis, this method can realize more comprehensive morphological and structural analysis of the hippocampus. A hippocampus analysis method was proposed to build a 3D Densely Connected Convolutional Networks (DenseNet) to learn the features of hippocampal regions which were further combined with the SPHARM-PDM for AD diagnosis (Cui and Liu 2018). In this method, the DenseNet was built on the local MR image patches of the hippocampus to learn the visual features of MR image while the SPHARM-PDM was applied on the binary hippocampal masks to extract the shape features. These two types of features were combined to enhance AD classification. Moreover, relevant studies found that there are also internal differences and asymmetry in the atrophy between left and right hippocampus in AD patients (Barnes et al. 2005; Hou et al. 2013; Maruszak and Thuret 2014; Yue et al. 2018), which further provided an important reference for early diagnosis and analysis of AD.

In this paper, we intend to propose a novel hippocampus analysis method based on multi-channel and cascaded convolutional neural networks (CNN), which can automatically learn and extract the combined hierarchical representations of hippocampal shapes and asymmetries from the binary hippocampal masks for AD diagnosis. First, the hippocampus segmentation is performed on the T1-weighted MR images to generate two binary masks, and the asymmetry mask is obtained by calculating the difference between the left and right hippocampus masks. Second, multi-channel 3D CNNs are individually trained with the hippocampal and asymmetry masks to hierarchically extract both shape and asymmetry features for image classification. An upper 2D CNN is cascaded at the end of 3D CNNs to capture the high-level correlation features. Finally, the multiscale features learned by the multi-channel and cascaded CNNs are combined for AD classification, and the saliency maps are also investigated to gauge the importance of corresponding hippocampal surface locations in AD classification for interpretation. The proposed method can jointly learn the combined hierarchical features of hippocampal shapes and asymmetries and the ensemble classifier has a stronger recognition ability for AD. Our method is evaluated with the baseline T1-weighted structural MR images from 194 AD, 164 pMCI, 233 sMCI, and 216 NC subjects from the ADNI dataset.

Compared with our previous works (Cui and Liu 2018), this paper has great differences in the investigated features and the deep neural network architecture of the proposed method for hippocampus analysis in AD diagnosis. In this paper, we propose to learn the combined features of hippocampal surfaces from the binary hippocampal masks, while our previous works explored the detailed visual features of MR images in the hippocampal regions. The network in this paper consists of three-channel and cascaded convolutional neural networks to hierarchically learn the features, while the network proposed in the previous works consists of the DenseNets with a set of dense blocks.

Materials and method

In this section, we present the proposed hippocampus analysis method using the T1-weighted MR brain images. In the proposed method, a multi-channel cascaded deep CNN model is proposed to gradually learn and combine the features of hippocampal shapes and asymmetries for AD diagnosis. Figure 1 shows the flowchart of our proposed algorithm, which consists of four main steps: 1) image acquisition and pre-processing; 2) segmentation of hippocampus; 3) feature learning by multi-channel cascaded CNNs; 4) final ensemble classification. More details of these steps are provided in the following subsections.

Fig. 1
figure 1

The flowchart of the proposed classification algorithm, where “3D CNNs” denotes the deep 3D CNN models to learn the shape and asymmetry features, “2D CNN” denotes the cascaded CNN model to learn the high-level features, and “Fusion by full connection” denotes a fully connected layer for the fusion of the shape and asymmetry features

Materials

The MR images under this study were collected from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. ADNI was initially launched in 2003 and funded as a private-public partnership contributed by private companies and the Foundations for the National Institutes of Health and the National Institute on Aging. It is a longitudinal multi-center study of healthy controls, mild cognitive impairment (MCI) and AD. Magnetic resonance imaging (MRI), (18F)-fluorodeoxyglucose positron emission tomography (FDG-PET), genetic and biochemical biomarkers, as well as the clinical and cognitive assessments, are captured for the participants at multiple time points. Their aim is to find out the specific correlations among clinical diagnosis results, cognitive assessments, the brain imaging, genetic and biochemical biomarkers during the entire spectrum of AD. The initial five-year study of ADNI, i.e., ADNI-1 was extended to ADNI-GO in 2009 by a Grand Opportunities grant, and further extended to ADNI-2, and ADNI-3 in 2011 and 2016 by competitive renewals.

In this work, we use the baseline T1-weighted MR brain imaging data from ADNI-1 for training the deep learning model and evaluating the proposed classification method. The baseline ADNI-1 dataset applied here consists of 1.5 T T1-weighted MR images of 807 subjects, including 194 AD, 164 pMCI, 233 sMCI and 216 normal controls (NC). The demographic and clinical information of the studied subjects are shown in Table 1, including the age, gender, education and clinical mental scores. Pre-processing of the images was performed before feature extraction. All MR brain images were preprocessed using a nonparametric nonuniform intensity normalization algorithm for correcting the inhomogeneity of image intensities, followed by skull-stripping and cerebellum removal (Sled et al. 1998; Wang et al. 2011). The affine registration with 12 degrees of freedom was then performed by setting the default parameters to linearly register the 3D structural MR images to a template image by using FSL 5.0 (Jenkinson et al. 2012; Smith et al. 2004; Woolrich et al. 2009).

Table 1 Demographic and clinical details of the studied subjects from ADNI dataset (The values are denoted as Mean ± Standard Deviation, MMSE: Mini-Mental State Examination; CDR: Clinical Dementia Rating; M: male; F: female.)

Hippocampus segmentation

In the existing clinical studies, the hippocampus has been repeatedly proved to be the most closely related to dementia. The decline in cognitive abilities is usually accompanied by some significant hippocampal atrophy. To achieve hippocampus analysis, it is necessary to segment the hippocampus from the provided MR images. Here we develop the 3D deep U-Net with a Dice-like loss function following (Liu et al. 2020) for hippocampus segmentation on structural MR images, by which a binary mask is generated for each hippocampus. Then, the centroids of the two segmentation masks are further computed for the translational alignments of the hippocampus. After segmentation and alignment, a 3D image patch of size 72 × 64 × 64 is cropped to cover each hippocampus mask, which is input into the subsequent CNN model for feature extraction and classification.

Due to the bilateral symmetry of the human brain, the left and right hippocampus are mirror-symmetrical to some extent. To unify the training process, the left hippocampus patch is flipped before fed into the deep CNN model. The asymmetry of the left and right hippocampi is obtained by subtracting the flipped left hippocampus mask and the right one, which is the input to another deep CNN for learning features of hippocampal asymmetry. To alleviate the potential overfitting problem and improve the robustness and generalization performance of the model, we also implement the data augmentation in this study. Specifically, the collected 3D image patches are augmented by shifting the coordinates of the two hippocampi and their margin with 1–2 voxels. Finally, based on the binary masks from the right and flipped left hippocampus as well as their mask margins, we construct and train three-channel CNN models, to respectively extract the features of hippocampal shapes and asymmetries for AD classification as shown in Fig. 1.

Feature learning by multi-channel cascaded convolutional neural networks

After hippocampus segmentation, several convolutional neural networks (CNNs) are investigated to model the hippocampal surface and automatically learn the complex features of the hippocampus. Compared with traditional methods, CNN has stronger feature extraction capabilities and can be automatically optimized through gradient derivation, which greatly reduces human interference. Here we construct multi-channel and cascaded deep CNNs to learn the combined hierarchical features of hippocampal shapes and asymmetries, based on the hippocampus masks and their mask difference for AD classification. The architecture of deep 3D CNNs is shown in Table 2. Since the hippocampal mask is a 3D binary image, the 3D convolutional kernel is used to efficiently encode the spatial features of the hippocampal surface.

Table 2 The architecture of deep CNNs

The first layer is an input layer that accepts a 3D binary mask image of fixed size (72 × 64 × 64 in the work). Then, the convolutional layer convolves the input image with the kernel filters, followed by a bias term and a non-linear activation function, and a feature map is generated by each filter for the representation. After the convolutional layer, there is a max-pooling layer to down-sample the feature map and generate the compact and efficient features for distinguishing images, which can help to achieve the robustness to some variations. In this way, three convolutional and pooling layers are alternatively stacked to gradually learn the hierarchical representation of 3D hippocampal shapes. Next, a convolutional layer followed by three fully connected layers is added to learn and flatten the features in the high level. In the fully connected layer, the output of each neuron is a linear combination of all inputs from the previous layer, passed through a nonlinearity. Since two hippocampi are symmetrically distributed on both sides of the brain, we build the deep CNNs of the same architecture for both hippocampi, but train them with different masks to learn the features individually.

Besides, some recent studies show that the asymmetry of two hippocampi in the human brain is an important biomarker for AD diagnosis (Gordon et al. 2013; Leung et al. 2011; Shi et al. 2010). Thus, we subtract the right and flipped left hippocampus masks to generate the mask margin for each subject. Then, we build a 3D deep CNN to learn the features of hippocampal asymmetry from the mask differences for AD diagnosis. In our implementation, this deep CNN has the same structure as the deep CNNs for the hippocampal masks. But it is trained with the mask differences instead of the hippocampal masks.

In this work, each deep CNN consists of 4 convolutional layers, 3 max-pooling layers, and 3 fully connected layers, as illustrated in Table 2. The sizes of the first three convolutional filters are 3 × 3 × 3, the last convolution filter is 1 × 1 × 1, and the numbers of kernels are set to 15, 25, 40 and 50 for 4 convolution layers, respectively. Max-pooling is applied for each 2 × 2 × 2 region, and Tanh is adopted as the activation function in these layers because of its good performance for CNNs. During the pre-training period, each CNN is optimized individually to output the class probabilistic score by a softmax layer. After the convolutional and max-pooling layers, all the 3D feature maps are flattened to 1D vector followed by two fully connected layers and a top-most softmax output layer. The Adadelta gradient descent is used to do the back-propagation and tune the trainable parameters. The dropout strategy is also applied to reduce the overfitting problem.

The final ensemble classification

In the above sections, the three-channel CNNs are trained with the left and right hippocampus masks, and their mask difference to capture the features of 3D hippocampal shapes and asymmetry. To make use of these features, we first build a cascaded 2D CNN to combine the features learned by deep 3D CNNs from two hippocampi as shown in Fig. 2. The output of 3D CNN is a feature vector of length 60. The cascaded 2D CNN can learn the high-level correlation features between two hippocampi. The cascaded 2D CNN consists of two convolutional layers and two fully connected layers, followed by a softmax classifier for AD classification.

Fig. 2
figure 2

The architecture of cascaded CNNs to learn the combined features from the left and right hippocampus

Second, a fully connected layer is further appended in the high level to combine the features learned by three-channel CNNs and the cascaded 2D CNN to improve the AD classification. There are 60 features generated by each deep CNN and 40 features learned by the cascaded 2D CNN. To train the proposed multi-channel cascaded convolutional neural networks, the three 3D CNNs are individually trained. Meanwhile, the 2D CNN and high-level fully connected layer followed by a softmax layer are finely tuned to make the final classification. Our proposed method is an end-to-end deep learning network, which can gradually capture the combined and hierarchical features of hippocampal shapes and asymmetry for the classification of disease status. The hippocampal features and disease classifier are jointly learned with the proposed deep learning framework.

Experimental results

In this section, we first introduce the preparation of datasets and the implementation of the proposed method. Then, the extensive experiments on the classifications of different groups of subjects are presented. Finally, we compare the proposed method with other competing methods, followed by the discussions on the salient regions for disease diagnosis.

Datasets and implementation

The proposed classification algorithm is evaluated on the T1-weighted MR brain images from the ADNI dataset. The size of MR images after initial preprocessing is 256 × 256 × 256 voxels. After hippocampus segmentation, a 3D image patch of size 72 × 64 × 64, centered on the centroids of hippocampus mask, is cropped to cover each hippocampus mask for feature extraction.

The proposed method is implemented with the Keras framework, accelerated by GPU of NVIDIA GeForce GTX 1080Ti and powered by Ubuntu16.04-×64. To train the 3D CNNs, the initial weights are uniformed for the whole network following default rules in Keras. Adadelta gradient descent optimizer is adopted with a learning rate of 0.7. The batch size is set to 32. The L2 regularization and dropout are also applied to address the overfitting problem.

To better evaluate the classification performance, a 5-fold cross-validation strategy is applied for the proposed method to reduce the effects of random factors. Specifically, we randomly and evenly divided the dataset under study into 5 groups. For each fold, we use one group as testing, another group as validation and the rest for traning the model. In the experiments, we use five classification performance measures for evaluation, which are classification accuracy (ACC), sensitivity (SEN), specificity (SPE), receiver operating characteristic (ROC) curve and the area under curve (AUC). ACC, SEN and SPE respectively reflect the proportion of all samples, positive samples (AD/MCI/pMCI subjects) and negative samples (NC/sMCI subjects) being correctly classified. The ROC curve is a comprehensive indicator that reflects the sensitivity and specificity of continuous variables. AUC is a summary of the ROC curve to measure the ability of a classifier to distinguish two classes.

Evaluations on multi-channel cascaded CNNs

The first conducted experiment is to evaluate the performance of multi-channel cascaded CNNs for AD classification. Two 3D CNNs are trained with the left and right hippocampus masks to extract the features of hippocampal shapes. One 3D CNN is used to learn the asymmetrical features from the mask difference between the left and right hippocampus. The proposed method with multi-channel cascaded CNNs can learn the combined hierarchical features of hippocampal shapes and asymmetry. Thus, we compare the proposed cascaded CNNs for extraction of the combined features with the 3D CNNs for extraction of shape and asymmetry features, which are denoted as “Proposed method”, “Shape” and “Asymmetry”, respectively. Table 3 demonstrates the comparison of the classification performances using the hippocampal shapes, asymmetry and the proposed combination method for classifications of AD vs. NC, MCI vs. NC and pMCI vs. sMCI. Accordingly, the ROC curves for the classifications of AD vs. NC, MCI vs. NC and pMCI vs. sMCI are displayed in Fig. 3a, b and c, respectively. It can be observed that the proposed method by multi-channel cascaded CNNs performs better than the individual 3D CNNs for learning the combined shape and asymmetry features.

Table 3 Comparison of classification performances by three deep CNNs for learning hippocampal shapes and asymmetry and the cascaded CNNs for learning the combined features on AD vs. NC, MCI vs. NC and pMCI vs. sMCI
Fig. 3
figure 3

Comparison of the ROC curves with the 3D CNNs to learn hippocampal shapes and asymmetry and the proposed cascaded CNNs to learn the combined features for classifications of a AD vs. NC, b MCI vs. NC and c pMCI vs. sMCI

Comparison with other methods

Next, we further compare our method with other methods proposed in the literature. The hippocampal volumes are firstly computed based on hippocampus masks of every subject. Besides, the shape analysis of the bilateral hippocampi is also performed using the SPHARM-MAT (a toolbox in Matlab) on the hippocampal mask, by which morphological shape features of the hippocampus are extracted for AD classification. In our implementation, 507 SPHARM coefficients are obtained for each hippocampus mask to train a Multi-Layer Perceptron (MLP) model, which is made up of two fully connected layers for feature reduction and one softmax layer for classification. For a fair comparison, we implement these methods using the same training and test data. The same image preprocessing processes are also applied for all the compared methods. Table 4 shows the comparison of the classification performances with the methods mentioned above. Furthermore, the ROC curves for classifications of different groups are illustrated in Fig. 4a, b and c, respectively. It turns out that our proposed method has an obvious advantage over the hippocampal volumes and shape analysis methods, which are also based on the binary hippocampus segmentation masks. Compared with our previous works in (Cui and Liu 2018), which built a DenseNet to extract the features of MR images in the local hippocampus, the proposed method has lower classification performances. This is because the proposed method focuses on hippocampal surface and extracts features only from the binary hippocampus masks, which have much less visual and appearance information than the MR images. Nevertheless, the proposed multi-channel cascaded CNNs achieves better classification performances (about 3.5–7.5% improvements of AUC) than the traditional SPHARM-PDM method, which also used the binary hippocampal mask for shape analysis. The results demonstrate that the proposed deep learning method works well on modeling the 3D surface of the hippocampal structure for disease diagnosis.

Table 4 Classification performance comparison of our method with other published methods for classifications of AD vs. NC, MCI vs. NC and pMCI vs. sMCI
Fig. 4
figure 4

Comparison of the ROC curves in different hippocampus analysis methods for classifications of a AD vs. NC, b MCI vs. NC and c pMCI vs. sMCI

Discussion

In clinical application, in addition to the performance of disease diagnosis, how to identify the relevant biomarkers is also important for medical interpretation. Although the proposed cascaded convolutional neural networks can jointly extract features from both hippocampus masks and asymmetry to achieve the optimal classification performance of AD diagnosis, the learned features cannot provide the important information for medical interpretation. To better identify the most significant biomarkers related to brain diseases, a visualization technique proposed in (Simonyan et al. 2013) is applied to provide convincing clinical evidence to better understand the abnormalities of the hippocampus surface caused by the disease. Specifically, a class saliency map is generated by calculating the gradient of the normalized score in the correct classification corresponding to the input image. The map with high saliency indicates that the brain area is more relevant to diseases for a given image. In the experiments, we generate the saliency maps of the hippocampus masks for all test images, and then calculate the average of them for illustration as shown in Fig. 5. The saliency maps can characterize the contribution of different sub-regions to the classification task, and gauge the importance of corresponding hippocampal surface locations in the prediction of disease for interpretation. Figure 5 (a), (b) and (c) show the saliency maps of the left hippocampus, right hippocampus and their asymmetry for the classifications of AD vs. NC, MCI vs. NC, and pMCI vs. sMCI, respectively. The warmer color corresponds to the larger salience in the hippocampal surface area for disease diagnosis, while blue and purple colors correspond to the smaller salience. It is observed that the larger saliences are often present in the global shape of the left and right hippocampi and their asymmetry, especially in their medial parts for classification of AD vs. NC than those of MCI vs. NC and pMCI vs. sMCI.

Fig. 5
figure 5

The saliency maps of the left hippocampus (left), right hippocampus (middle) and their asymmetry (right) for the classifications of a AD vs. NC, b MCI vs. NC, and c pMCI vs. sMCI, respectively, which indicates the spatial importance of hippocampal surface

Conclusion

In this paper, we develop a novel deep learning classification method based on multi-channel cascaded convolutional neural networks to learn the combined hierarchical features of hippocampal shapes and asymmetry from binary hippocampal masks for AD and MCI diagnosis. The multi-channel 3D CNNs are constructed and trained with the 3D hippocampus binary masks and their mask difference to learn the features of hippocampal shapes and asymmetry. Then, a 2D CNN is cascaded on the 3D CNNs to learn the high-level features between two hippocampus masks. The final classification is made by the combination of all features from 3D CNNs and cascaded 2D CNN. The proposed method can make full use of the asymmetry and shape features to improve the disease classification. Experimental results and comparison on structural MR images from the ADNI dataset demonstrate that the proposed method has promising performances for disease classification. Besides, the saliency maps are also investigated to gauge the importance of hippocampal surface locations in disease diagnosis for interpretation.