1 Introduction

Alzheimer’s disease (AD) is the most common dementia whose main symptom is a significant decline in memory and other cognitive disabilities. Mild cognitive impairment (MCI) is the stage between normal ageing and dementia, including mild memory or mental problems, but without significant disability. 40%~60% of people with MCI will deteriorate into Alzheimer’s within 4~6 years [10]. Therefore, it is significant to develop an automatic diagnosis method for AD, MCI and normal control (NC).

Magnetic resonance imaging (MRI) has many advantages, and diagnosing AD with MRI has attracted more and more attention [1]. Magnin et al. presented an automated method based on support vector machine (SVM) classification of whole-brain anatomical MRI to discriminate patients from AD and elderly control subjects [11]. Mesrob et al. partitioned the whole-brain into anatomical regions, and classified it with nonlinear SVM [12]. Fung et al. compared the difference between SVM and fisher linear discriminant (FLD) [3]. The method, considering the sparsity of brain MRI, improves the performance of algorithm [19]. It is widely believed that the l1-norm sparsity constraint on coding coefficients plays a key role in the success of sparse representation based classification (SRC). However, Zhang et al. [17] argued that the success of SRC should be largely attributed to the collaborative representation of a test sample by the training samples across all classes. To solve the shortage of training samples, they further proposed an effective collaborative representation based classifier (CRC) by utilizing l2-norm regularization.

The main pathological changes of Alzheimer’s disease in MRI are cortical diffuse atrophy, sulci widened and ventricular enlargement. Because of brain shrinkage, classifying according to the volume of brain white matter, brain gray matter and cerebrospinal fluid (CSF) is theoretically possible. Gabor wavelet transformation is an important tool in signal analysis and processing in time and spatial domain. It has been widely used in model recognition, such as image processing and feature extraction. Recent studies showed that information fusion from multiple features was helpful to enhance the diagnostic performance [13]. In the previous research [8], using volume of every block as feature got satisfactory results. In addition, because of sulci widened and ventricular enlargement, classifying according to the texture feature of image is theoretically possible [21]. The above methods mainly focus on 2D texture features. Liu et al. [6] extracted 3D texture features to discriminate AD, MCI and NC and proved the use for AD diagnosis. Oppedal et al. [7] explored the use of two different 3D local binary pattern (LBP) texture features extracted from T1 MRI of the brain combined with a random forest classifier in an attempt to discern patients with AD, Lewy body dementia (LBD) and NC.

Different with traditional two dimensional 256-bit color image, MRI is three dimensions with high gray-scale data. So reducing dimension is necessary for diagnosing Alzheimer’s disease. Brain could be divided into different region of interest (ROI), different ROI have different effect on AD. Some ROI have small even no effect on AD and some ROI have similar even same effect on AD. Anatomical Automatic Labeling (AAL) automatically divides brain into 90 ROI [9]. Zhu et al. [18] selected ROI which have certain influence on AD by principal components analysis (PCA). But it ignored the similarity of different ROI. In statistics, T-test is a two-sample location test which is used to test the hypothesis that two populations have equal means. T-test is more reliable when the two samples have unequal variances and unequal sample sizes. It is typically applied when the statistical units underlying the two samples being compared are non-overlapping. In view of this characteristic of T-test, T-test was used to select ROI which have similar effect on AD. In [14], first, a subset of measures that have a low level of correlation among the measures was determined; then, through T-tests, another subset of measures which are capable of separating the two groups was identified; the intersection of these subsets yielded the final feature set. The intersection method leads repeated calculation and reduces speed. So we first selected ROI which is high irrelevance with each other by T-test; then, we used PCA to select ROI which has high influence for classification based on selected ROI.

Classification accuracies of AD and NC for most of the existing methods are usually more than 90%, but the accuracies of MCI and NC are still not satisfactory [22]. To improve the accuracy of the AD classification, especially MCI, a novel method is proposed to diagnose AD in this paper. The subject-labeled image was obtained by AAL template. Next, by selecting some interested structures of brain automatically based on T-test, principal components of the selected structures are extracted with PCA. Finally, individual subjects were classified by CRC. Experiments showed the effectiveness of the proposed method.

The rest of this paper is organized as follows. Section 2 presents the details of the proposed algorithm. Section 3 reports the experimental results. Finally, section 4 concludes the paper and offers suggestions for future works.

2 Method

This section details the proposed method that can be divided into three parts, namely image pre-processing, interested structures selection, feature extraction, and classification, respectively. Figure 1 illustrates the general frame work of our method.

Fig. 1
figure 1

Framework of the proposed method

2.1 Notations

Let \( {X}_j=\left[{x}_{j1},\cdots, {x}_{ji},\cdots, {x}_{jn_j}\right]\in {\Omega}^{d\times {n}_j} \) be a nj × d matrix that represents d features of nj samples of class j, where j = 1, 2. \( \overline{X_j}=\left[\overline{x_1},\cdots, \overline{x_i},\cdots, \overline{x_{n_j}}\right]\in {\Omega}^{1\times {n}_j} \) represents average vector of class j. \( {s}_j=\left[{s}_1,\cdots, {s}_i,\cdots, {s}_{n_j}\right]\in {\Omega}^{1\times {n}_j} \) represents standard deviation of class j. B1, B2, ⋯, B90 represent the coordinates of first 90 structures of AAL template.

2.2 Pre-processing

Pre-processing is an important step in the process of classification, and its primary purpose is to provide a better condition for following automatic processing. The principal contents of pre-processing are as follows: 1. Normalization: projecting the gray-scale of every tissues in all images into a standard gray-scale range; 2. Image aligning: aligning corresponding position of the images; 3. Image segmentation: in the MR image processing, we often process the intracranial area only and may process brain white matter, brain gray matter and cerebrospinal fluid respectively, so segmenting brain to white matter, brain gray matter, cerebrospinal fluid and skull is necessary.

Different imaging environment may cause different gray-scale at the same place of the same object. This problem is an unfavorable factor in medical imaging recognition or diagnosis and should be overcame in the following processing. Therefore, the gray-scale of images should be normalized. Normalization is an image conversion method that is used to reduce or even eliminate gray-scale inconsistency in the image while retain gray-scale difference having diagnostic value. The normalization method had been integrated into 3DVIEWNIX system, so we applied it directly in this paper.

Because of the complication of gyrus in MR image, the corresponding positions in image always are the different tissues. Therefore, aligning image is necessary. For all samples, we aligned them with Montreal Neurological Institute (MNI) brain firstly. This template divides brain into 116 structures. The 90 structures are the cerebrum and the 26 structures are the cerebellum. Only the structures of cerebrum were used. Then we applied SPM software package to align in this paper.

Segment function in SPM software package have very well effect and have been used widely. Brain white matter, brain gray matter and cerebrospinal fluid in MR image were segmented by SPM software package in this chapter.

2.3 Selecting the interested structures

In this paper, we proposed a method to select interested structures based on T-test. Firstly, MR images were parceled into 90 anatomical structures using SPM automatically. Then we calculated the significance level with T-test of every dimension. Next, we could get the mean of significance value in every structure. Finally, we selected the structures which is significant for later processing.

The standard deviation of the samples was not sure, so we used formula (1) to calculate t:

$$ \boldsymbol{t}=\frac{{\overline{X}}_1-{\overline{X}}_2}{\sqrt{\frac{{S_1}^2}{n_1}+\frac{{S_2}^2}{n_2}}} $$
(1)

The lower t value is, the higher significance level of the feature is. After calculating all t value of every dimension for the samples, we obtained the average value of t of every structure, \( \overline{t}= mean\left({t}_{i1},{t}_{i2},\cdots, {t}_{in}\ \right) \). Then we resorted the structures according to the \( \overline{t} \) value and selected the first n structures as the interested structures. For the value of n, we would do experiment to determine the best value later.

2.4 Feature extraction

2.4.1 The extraction of LBP texture feature

The calculation process of LBP could be descripted as that for any 3 × 3 neighborhood, texture T of basic LBP operator could be defined by these 9 pixels:

$$ T=t\left({g}_c,{g}_0,\cdots {g}_7\right) $$
(2)

where gc is central point and (g0, g1, g2, g3, g4, g5, g6, g7) are gray-scale of 8 neighborhood, respectively. The 8 neighborhoods were coded, the pixel higher than or equal to central pixel was written as 1, or 0. It can be presented with formula (3):

$$ T\approx t\left(s\left({g}_0-{g}_c\right),s\left({g}_1-{g}_c\right),\cdots, s\left({g}_7-{g}_c\right)\right) $$
(3)

where

$$ s(x)=\left\{\begin{array}{c}1\kern0.5em if\ x\ge 0\\ {}0\kern2.5em esle\end{array}\right. $$

Then the obtained coding was transformed to decimal system by formula (4):

$$ LBP=\sum \limits_{p=0}^7{2}^ps\left({g}_i-{g}_c\right) $$
(4)

The original LBP operator only calculates texture feature in 3 × 3 rectangular block, the scale of it is small. Meanwhile, it provides 256 models.

The original LBP operator can’t abstract large-size structural texture feature. To improve this problem, the paper [20] modified LBP and put forward circular LBP operator. Specifically, in certain region of image, a circular neighborhood whose radius is R (R > 0) was defined, as showed in Fig. 2. P (P > 0) pixel points g0, g1, ⋯, gp − 1 were distributed on circumference uniformly.

Fig. 2
figure 2

Several circular neighborhood with different P and R

The point coordinate (xp, yp) in the neighborhood could be computed by formula (5).

$$ \left({x}_p,{y}_p\right)=\left({x}_c+R\mathit{\cos}\left(2\pi p/P\right),{y}_c+R\mathit{\sin}\left(2\pi p/P\right)\right) $$
(5)

where (xc, yc) is central pixel. Circular operator couldn’t insure that all pixel were on integer point, so bilinear interpolation was used to get the pixel value of this sampling point, then formula above was used to calculate texture feature.

When select p points, 2P models were produced according to formula (4). Too many models were disadvantageous for classification recognition, uniform pattern was provided in paper [20] to solve this problem. This model includes at most two hops, i.e. from 0 to 1 or from 1 to 0, in corresponding binary string. The calculation formula is represented by formula (6), and it reduce the number of models to p × (p − 1) + 2.

$$ U\left({LBP}_{P,R}\right)=\left|s\left({g}_{p-1}-{g}_c\right)-s\left({g}_0-{g}_c\right)\right|+{\sum}_{p=1}^{p-1}\left|s\left({g}_p-{g}_c\right)-s\left({g}_{p-1}-{g}_c\right)\right| $$
(6)

In addition, paper [20] also put forward rotationally invariant LBP algorithm, the feature calculated by this method had rotation invariance. The formula is as follow:

$$ {LBP}_{P,R}^{ri}=\mathit{\min}\left( ROR\left({LBP}_{P,R}^{ri},i\right)\right),i=0,1,\cdots, P-1 $$
(7)

where ROR(x) is rotation function, it represents that rotate x right i seats (i < P). The introduction of rotation invariance leads that LBP performs more prominent to image rotation and the varieties of models are reduced dramatically. The model which is rotation invariance could also combine with uniform model, it is represented as follow:

$$ {LBP}_{P,R}^{riu2}=\left\{\begin{array}{c}{\sum}_{p=0}^{p-1}{2}^Ps\left({g}_p-{g}_c\right),\kern0.5em \\ {}P+1,\end{array}\ \begin{array}{c}U\left({LBP}_{P,R}\right)\le 2\\ {} other\end{array}\right. $$
(8)

2.4.2 The extraction of Gabor wavelet feature

2D Gabor wavelet filter is defined as follow:

$$ \psi \left(x,y,\omega, \theta \right)=\frac{1}{2\pi {\sigma}^2}{e}^{-\left(\frac{x{\prime}^2+y{\prime}^2}{2{\sigma}^2}\right)\left[{e}^{i\omega x\prime }-{e}^{-\frac{\omega^2{\sigma}^2}{2}}\right]} $$
(9)

where x = x cos θ + y sin θ, y =  − x sin θ +  cos θ, (x, y) is spatial coordinate, ω is radial center frequency, σ is the mean square deviation of Gaussian function along x and y axis. Gabor feature of image I is calculated by formula (10):

$$ {O}_{m,n}\left(x,y\right)=I\left(x,y\right)\ast \varPsi \left(x,y,{\omega}_m,{\theta}_n\right) $$
(10)

If we calculated Gabor filter of m scales and n directions, m × n Gabor filters and many features would be generated according formula (10). For example, the real parts of Gabor filters with \( \omega =\frac{\pi }{2} \) and 8 directions \( \theta =\left(0,\frac{\pi }{8},\cdots, \frac{\pi }{8}\right) \) were displayed in Fig. 3. Eight feature images were gotten by convoluting these filters with images respectively.

Fig. 3
figure 3

The real parts of filters

The features gotten were real or imaginary. In this paper, the absolute values were selected as feature images:

$$ {I}_{m,n}= abs\left({O}_{m,n}\right) $$
(11)

2.4.3 Multi-feature extraction

The texture features of NC, MCI and AD samples are different. In this paper, LBP and Gabor wavelet were used to extract the texture feature of image, and then collaborative representation was used to classify.

Because these two algorithms were more appropriate for 2D image, the method in this paper transformed 3D image into 2D image. For example, 3D image after pre-processing was D ∈ Ω91 × 109 × 91, 2D image transformed was I ∈ Φ91 × 109. If D(:,  : , n) represented 2D matrix from 3D matrix whose third dimension is n, the transformation process would be presented as I1 = D(:,  : , 1), I2 = D(:, ; , 2), ⋯, I91 = D(:,  : , 91), I = [I1, I2, ⋯, I91]. The feature of I was respectively calculated by LBP operator and Gabor filter after I was gotten. According to the calculation process of LBP operator, every point except outermost circle pixels of matrix could calculate one feature value. So the matrix calculated by LBP operator was seen as a feature map, in this paper original LBP operator was used to calculate feature map. Gabor feature was calculated by convoluting image with every Gabor. Then absolute value of it was calculated. Every Gabor filter could generate one feature map, m × n feature maps were generated finally. Feature maps were corresponding with pixels of original feature, so extracting ROI could base on AAL model. Reduce dimension by PCA after blocking, and the residual error was calculated by collaborative representation algorithm. Paper [15] had verify that multi-feature had some role for classification in collaborative representation classification and it got good result in the task of face recognition. In this paper, easier method was used, formula (12) was used to calculate synthetic residual error of multiple feature maps and classify MR image.

$$ sum{r}_i=\sum \limits_{i=1}^n\frac{r_{i,j}}{R_j} $$
(12)

where ri,j is the residual error of j-th feature of i-th class. R =  max (r1, r2, ⋯, rj).

2.5 Classification

Collaborative Representation based Classification (CRC) is an efficient method for pattern recognition, and we used this method for MR images classification.

Assuming A = [ai1, ai2, ⋯a1p, ⋯ak1, ak2, ⋯akp] is the training samples, and aij is the j-th sample for i-th class. If matrix A were the dictionary matrix, a general model of collaborative representation would be as follows:

$$ \widehat{x}=\underset{x}{\arg \min}\left\{{\left\Vert y- Ax\right\Vert}_2^2+\lambda {\left\Vert x\right\Vert}_2^2\right\} $$
(13)

where y is a test sample, so we can use A and x to represent the y. ‖‖2 stands for the l2 norm. \( \widehat{x} \) can be estimated by using least-squares estimation:

$$ \widehat{x}={\left({A}^TA+\lambda I\right)}^{-1}{A}^Ty $$
(14)

Assuming P = (ATA + λI)−1AT, the P is unrelated to the input sample y, so we can calculate it during training phase. In recognition phase, (15) is used to calculate \( \widehat{x} \).

$$ \widehat{x}= Py $$
(15)

After that, the residuals can be calculate by (16):

$$ {r}_i=\left\Vert y-{A}_i{\widehat{x}}_i\right\Vert $$
(16)

where Ai, \( {\widehat{x}}_i \) stand for the dictionary and coefficient of i-th class. Finally, the identity of testing sample y can be calculated by (17):

$$ \mathrm{identity}(y)={\mathrm{argmin}}_i\left\{{r}_{\mathrm{i}}\right\} $$
(17)

3 Experiments

3.1 Data and experimental settings

Experimental data used in this paper are from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, which was established in 2003 by the US National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Biological Engineering (NIBIB), the US Food and Drug Administration (FDA), and the private pharmaceutical companies, all of them are non-profit organizations. ADNI database invested 60 million dollars and ADNI recruited a large number of subjects that are from 50 locations in the United States and Canada. The ages of recruited people were between 55 and 90 years old, and these people included normal cognitive people, MCI early group, MCI late group and AD patients.

In our experiments, we employed all MR images from ADNI, including 50 AD patients, 55 MCI patients and 50 NC. The dimensions of them is 240 × 256 × 176 and the voxel size of them is 1 × 1 × 1.2. The number of females and males are almost same. The MR scans are T1-weighted MR images, and the images which we used were acquired at 3 T. All the experiments were implemented on Matlab 2014a.

To evaluate the performance of the method we proposed, we randomly divided the data into two sets, named as testing data (20 AD, 20 MCI and 20 NC) and training data (30 AD, 35 MCI and 30 NC). Then we used CRC method to classify the testing data, and got the result of the experiment. We could get 100 experimental results when we repeated the above process 100 times. As a result, we obtained a mean accuracy value of all the experimental results.

The dimension of features should be less than the size of training samples when we use CRC method, so we used PCA to reduce the dimension of features to 55 dimension. According to our experimental results, λ value was set as 0.001.

3.2 Effect of pre-processing

3.2.1 Normalization

Because of the large number of sample, this chapter only considered the effect of gray-scale normalization. So we selected 15 images from NC and AD, and calculated out the histograms of these images before and after gray-scale normalization respectively. All histograms were displayed in one chart. The previous gray-scale of image was 4096. After processing, many gray-scale values were lost, and gray-scale of image reduced to 64. A set of gray-scale value (656, 704, 752, 800, 848, 896) should be (41, 44, 47, 50, 53, 56) in the 256 Gy-scale, but the actual value was (44, 44, 48, 52, 56, 56). When calculating the histogram of 256 Gy-scale, many vertical lines which might disturb observation appeared. After the number of straight line of histogram was set as 64, gray-scale histograms for every samples were gathered into one chart, as showed in Fig. 4. Fig. a, c, e present brain gray matter, brain white matter and cerebrospinal fluid of partial samples before normalization respectively. Fig. b, d, f present brain gray matter, brain white matter and cerebrospinal fluid of partial samples after normalization respectively. Experiment showed that coincidence rate of histogram of a, c, e was lower than b, d, f. This illustrated that gray-scale normalization made sense to a certain extent.

Fig. 4
figure 4

Gray-scale histogram of GM, WM and CSF of 15 samples before and after normalization

3.2.2 Image aligning

Normalise wasn’t displayed in this chapter because of its poor processing result. The result of using Coregister function to register each sample with average graph was displayed in this chapter. As shown in Fig. 5, a and b are original image and AAL template respectively; c is MNI152T1 image used to make AAL template; d is image after aligned. Upper left, upper right and below left of each figure are images of a layer of coronal, sagittal and transverse plane respectively. The images showed that they were aligned in general.

Fig. 5
figure 5

Images registration (The left image, middle image, right image are the main view, side view, top view of brain, respectively)

3.2.3 Image segmentation

The results of segmentation are shown in Figs. 6 and 7. Figure 6 is design sketch of image segmentation before pre-processing and Fig. 7 after, where a is brain gray matter; b brain white matter; c cerebrospinal fluid; d skull. It could be seen that results segmented by this software wasn’t binary image and there was transition between boundary and background in image. In fact, the data type of results was unsigned int. Threshold T set as 30 in this paper transformed image data type into logical type and was used as mask off code in later processing. The advantage of manual segmentation is more accurate, the disadvantage is that it cost much of time and need operation by professional. Meanwhile, different persons make different results and every segmentation of same person also make different results. The advantage of automatic segmentation is convenient automatic processing and stableness of segmentation results. The disadvantage is the lower accuracy than manual segmentation. The advantage of pre-processing by software is saving time and every processing results by software are same stably.

Fig. 6
figure 6

Segmentation results of image before pre-processing (The left image, middle image, right image are the main view, side view, top view of brain, respectively)

Fig. 7
figure 7

Segmentation results of image after pre-processing (The left image, middle image, right image are the main view, side view, top view of brain, respectively)

3.3 Experiments for different number of interested structures

We evaluated the performance of method by computing the classification accuracy (ACC), as well as the sensitivity (SEN), the specificity (SPE), the accuracy of positive samples (PAC) and the accuracy of negative samples (NAC). Table 1 showed some notations. The computing equations of SEN, SPE, PAC and NAC are as follows:

$$ {\displaystyle \begin{array}{ll} SEN=\frac{TP}{TP+ FN}& SPE=\frac{TN}{TN+ FP}\\ {} PAC=\frac{TP}{TP+ FP}& NAC=\frac{TN}{TN+ FN}\end{array}} $$
Table 1 Confusion Matrix

To investigate the effect of interested structures on the experimental results, we tried to use different numbers of interested structures. The number of structures that we selected ranges between 2 and 18, and then we calculated the accuracy of ACC, SEN, SPE, PAC, and NAC. Table 2 shows the experimental results of all samples with different numbers of interested structures.

Table 2 Accuracies of structures with different numbers (where the boldface is the highest accuracy)

From the experimental results, we can find that it is satisfied when using small amount interested structures. When the number of interested structures is large, the accuracy is lower. It proves that the structures selected by our proposed method include more useful information. And when the number of interested structures is 12, we can get a better accuracy. In the rest of this paper, we will also use 12 structures to do the rest experiments.

3.4 Experiments for CRC classification of multi-feature image

In this experiment, original LBP operator was used as LBP operator, Gabor filter with \( {\omega}_{max}=\frac{\pi }{2} \) scale and 8 directions \( \theta =\left(0,\frac{\pi }{8},\cdots, \frac{\pi }{8}\right) \) was used. ROI of LBP and Gabor feature image was extracted respectively, and the dimension of it was reduced to 60, then collaborative representation was used for classification. The results of experiment was showed in Table 3. It showed that the classification effect was not better than collaborative representation with original image. It is mainly manifested in worse classification effect with MCI and NC, only classification effect with NC and AD was advanced slightly. Compared with classification by original image, such as SVM and ELM, multi-feature collaborative representation had certain advantages.

Table 3 Classification results of multi-feature CRC

3.5 Comparison of different classifiers

To investigate the effectiveness of the CRC method, we classified the data with different classifiers when we used the interested structures and we compared the accuracy of different classifiers in this section. All the processes were the same except the classifiers. We used SVM and FLD to classify the feature vectors obtained before. Table 4 shows the results of these experiments.

Table 4 The comparison of SVM, FLD and CRC (where the boldface is the highest accuracy)

We can find that CRC is much better than other two classifiers. Because biological feature conducing AD is sparse, CRC gets better results. But in other papers, the accuracy of SVM is good. It shows that CRC has stronger robustness to different features.

3.6 Comparison with present methods

To investigate the effectiveness of the proposed method, we compared it with several existing single mode and multi-modal methods. Table 5 shows the accuracy of proposed method and single mode methods. Table 6 shows the accuracy of proposed method and multi-modal methods. The third and fourth columns are the accuracy.

Table 5 Accuracies of single modality methods and the proposed method (where the boldface is the highest accuracy)
Table 6 Accuracies of multi-modality methods and the proposed method (where the boldface is the highest accuracy)

We can find that the accuracy of the proposed method is better than the most of the single mode methods, but it is lower than Tong’s method which need to find the regions manually. So our proposed method has a better performance in single mode automatical methods. But compared with the multi-modal methods, the results obtained by our proposed method are not prominent. Using multi-modal method is useful to AD diagnosis, so in the future, we will use multi-modal images for AD diagnosis by our proposed method.

4 Conclusions

This paper presented an AD diagnosis method via T-Test and multi-feature. The proposed method firstly used SPM software to pre-process MR images, secondly selected interested structures based on T-test, then extracted LBP texture and Gabor wavelet as multi-feature, finally classified images to diagnose AD by CRC. The classification experiments were performed on the 50 AD samples, 55 MCI samples and 50 NC samples. The experimental results were satisfactory in classifying MR images automatically. When we used CRC to classify, we proved the effectiveness of the selected structures. But we only used MR images in our experiments, it is not enough. So we would use multi-modal images to get feature vectors in the future, such as PET, MMES score, genetic and many others.