1 Introduction

Alzheimer’s Disease (AD) is the most common neurodegenerative disorder in the world, with more than 46 million people affected [2]. With the current population ageing in developed countries, this number is expected to increase up to 131.1 millions by 2050 [2]. Therefore, an early diagnosis is needed for an early intervention and improving the life expectancy and quality of life of the affected subjects and their families.

Currently, one of the most extended techniques to explore neurodegeneration in AD is Magnetic Resonance Imaging (MRI). It provides us a non-invasive tool to explore the internal structure of the brain and the distribution of Gray Matter (GM) and White Matter (WM), which usually has a correlation with neurodegeneration [3, 11], in contrast to other neuroimaging modalities, such as Single Photon Emission Computed Tomography (SPECT) or Positron Emission Tomography (PET), which require the injection of a radiopharmaceutical. Analysis of these images is usually performed visually or using semiquantitative tools to assess the degree of neurodegeneration.

In contrast to traditional visual and semiquantitative analysis of images, many fully automated systems for analysing MRI images have been proposed. Apart from the widely extended Voxel Based Morphometry (VBM) [3], numerous feature extraction algorithms have been proposed. Some recent approaches include decomposition via Principal Component Analysis (PCA) [8, 25], Independent Component Analysis (ICA) [17], Partial Least Squares (PLS) [22], or projecting information of the brain to a bidimensional plane using Spherical Brain Mapping (SBM) [14, 18]. In a recent review [10], a wide variety of algorithms using shape, volume and texture analysis were reported for the diagnosis of AD. Of these, texture analysis has already been used in the analysis of neuroimaging of different modalities with great success [13, 16, 28].

In this work, we propose a system that combines brain region segmentation of T1-weighted images using a strategy based on atlas masking, and a posterior texture analysis of each region. These features extracted at each region are used to quantify which combination of measures and regions are useful to characterize neurodegeneration in AD.

This article is organized as follows. First, the methodology used to analyse MRI images, and evaluate our system is detailed in Sect. 2. Later, in Sect. 3, the results are presented. Finally, we draw some conclusions about our system at Sect. 4.

2 Methodology

2.1 Atlas Segmentation

In our work we wanted to test the hypothesis that structural changes in different areas of the brain can predict changes in neurodegeneration, and therefore, be related to Alzheimer’s Disease (AD). We will compute the structural changes at different regions using a naive atlas segmentation.

Atlas segmentation is a technique that uses a brain atlas, such as the Automated Anatomical Labelling (AAL) [26], the Montreal Neurological Institute (MNI) or IBASPM [1], to mask out different regions in images registered to the same space. Since in our case, T1-weighted images are registered to the MNI space, we have applied the IBASPM atlas to extract regions to which a subsequent texture analysis has been applied.

The IBASPM atlas consist of 90 cortical and subcortical regions divided by hemisphere, that have been set using three elements: gray matter segmentation, normalization transform matrix (the matrix used to map voxels from individual to the MNI space) and the MaxPro MNI atlas (Fig. 1).

Fig. 1.
figure 1

Some axial cuts of the IBASPM atlas.

2.2 Haralick Texture Analysis

Texture analysis is usually based on the computation of a Gray-Level Co-occurrence (GLC) matrix. This matrix is defined over an image to be the distribution of co-occurring values at a given offset. Mathematically, we can define the co-occurrence matrix over a \(n\times m\) bidimensional image \(\mathbf {I}\) as:

$$\begin{aligned} \mathbf {C}_{\varDelta x, \varDelta y}(i,j)=\sum _{p=1}^n\sum _{q=1}^m{\left\{ \begin{array}{ll} 1, &{} \text {if }\mathbf {I}(p,q)=i\text { and }\mathbf {I}(p+\varDelta x,q+\varDelta y)=j \\ 0, &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(1)

where i and j are the different gray levels. For simplicity, the image is usually quantized to \(N_g\) gray levels. In this work we have used \(N_g=16\).

The parametrization of the GLC matrix using the offsets (\(\varDelta x, \varDelta y\)) can make it sensitive to rotation. Therefore, we will use different offsets at different angles to get to some degree of rotational invariance. For simplicity, we will use the same distance d in all directions, and therefore, we can rewrite the offset vector \(\varDelta _\mathbf {p}\) as:

$$\begin{aligned} \varDelta _\mathbf {p} = d (\varDelta _x, \varDelta _y) \quad \text {so that} \quad \varDelta _x, \varDelta _y \in \lbrace -1, 0, 1\rbrace \end{aligned}$$
(2)

Using this parametrization, it is easy to expand the GLC matrix to a three-dimensional image \(\mathbf {I}\) of size \(n\times m\times k\), parametrized this time by a 3D offset [20]:

$$\begin{aligned} \mathbf {C}_{\varDelta _p}(i,j)=\sum _{\mathbf {p}=(1,1,1)}^{(n,m,k)}{\left\{ \begin{array}{ll} 1, &{} \text {if }\mathbf {I}(\mathbf {p})=i\text { and }\mathbf {I}(\mathbf {p}+\varDelta _\mathbf {p})=j \\ 0, &{} \text {otherwise}\end{array}\right. } \end{aligned}$$
(3)

We use thirteen spatial directions to compute every GLCM in the 3D space [20]. In this work we will use \(d=\lbrace 1, 2, 3\rbrace \), and therefore, \(3\times 13=39\) GLC matrices will be computed for each region. To extract features from these GLC matrices, let us define the probability matrix \(\mathbf {P}\) as:

$$\begin{aligned} \mathbf {P}(i,j) = \frac{\mathbf {C}_{\varDelta _p}(i,j)}{\sum _{i,j}\mathbf {C}_{\varDelta _p}(i,j)} \end{aligned}$$
(4)

With this new matrix of probabilities, we can compute the thirteen Haralick Texture measures that were defined in the original Haralick paper [7], with the following expressions:

$$\begin{aligned} f_1&= \sum \limits _i\sum \limits _j \mathbf {P}(i,j)^2\end{aligned}$$
(5)
$$\begin{aligned} f_2&= \sum \limits _{n=0}^{N_g-1} n^2 \left\{ \sum \limits _{|i-j|=n}\mathbf {P}(i,j)\right\} \end{aligned}$$
(6)
$$\begin{aligned} f_3&= \frac{\sum _i\sum _j ij\mathbf {P}(i,j) - \mu _x\mu _y}{\sigma _x\sigma _y}\end{aligned}$$
(7)
$$\begin{aligned} f_4&= \sum _i\sum _j (i-\mu )^2 \mathbf {P}(i,j)\end{aligned}$$
(8)
$$\begin{aligned} f_5&= \sum _i\sum _j \frac{\mathbf {P}(i,j)}{1+(i-j)^2}\end{aligned}$$
(9)
$$\begin{aligned} f_6&= \sum \limits _{k=2}^{2N_g} k \sum _{i+j=k}\mathbf {P}(i,j)\end{aligned}$$
(10)
$$\begin{aligned} f_7&= \sum \limits _{k=2}^{2N_g} (k-f_6)^2 \sum _{i+j=k}\mathbf {P}(i,j)\end{aligned}$$
(11)
$$\begin{aligned} f_8&= -\sum \limits _{k=2}^{2N_g} \sum _{i+j=k}\mathbf {P}(i,j) \log \left\{ \sum _{i+j=k}\mathbf {P}(i,j)\right\} \end{aligned}$$
(12)
$$\begin{aligned} f_9&= -\sum \limits _i\sum \limits _j \mathbf {P}(i,j)\log (\mathbf {P}(i,j)) \end{aligned}$$
(13)
$$\begin{aligned} f_{10}&= \text {VAR}\left\{ \sum _{|i-j|=k}\mathbf {P}(i,j)\right\} \end{aligned}$$
(14)
$$\begin{aligned} f_{11}&= -\sum \limits _{k=0}^{N_g-1} \sum _{|i-j|=k}\mathbf {P}(i,j) \log \left\{ \sum _{|i-j|=k}\mathbf {P}(i,j)\right\} \end{aligned}$$
(15)

where \(f_1\) is the Angular Second Moment (ASM), \(f_2\) is Contrast, \(f_3\) is Correlation, \(f_4\) is Sum of Squares: Variance, \(f_5\) is the Inverse Difference Moment, \(f_6\) is the Sum Average, \(f_7\) is the Sum Variance, \(f_8\) is Sum Entropy, \(f_9\) is Entropy, \(f_{10}\) is Difference Variance and \(f_{11}\) is Difference Entropy.

For the last two texture measures, \(f_{12}\) and \(f_{13}\), let us define:

$$\begin{aligned} p_x(i)&= \sum \limits _{j=1}^{N_g} \mathbf {P}(i,j)\end{aligned}$$
(16)
$$\begin{aligned} p_y(j)&= \sum \limits _{i=1}^{N_g} \mathbf {P}(i,j) \end{aligned}$$
(17)

Let us also note \(H_X\) and \(H_Y\) the entropies of \(p_x\) and \(p_y\) respectively, and:

$$\begin{aligned} H_{XY}&= -\sum \limits _i\sum \limits _j \mathbf {P}(i,j)\log \mathbf {P}(i,j) \end{aligned}$$
(18)
$$\begin{aligned} H_{XY1}&= -\sum \limits _i\sum \limits _j \mathbf {P}(i,j)\log \left\{ p_x(i) p_y(i)\right\} \end{aligned}$$
(19)
$$\begin{aligned} H_{XY2}&= -\sum \limits _i\sum \limits _j p_x(i) p_y(i)\log \left\{ p_x(i) p_y(i)\right\} \end{aligned}$$
(20)

where \(H_{XY}\) is their joint entropy, \(H_{XY1}\) and \(H_{XY2}\) would be the join entropy of X and Y assuming independent distributions.

With all these notations, the last two measures, known as Information Measures of Correlation (IMC-1 and IMC-2) can be defined as:

$$\begin{aligned} f_{12}&= \frac{H_{XY}-H_{XY1}}{\max \lbrace H_X,H_Y\rbrace }\end{aligned}$$
(21)
$$\begin{aligned} f_{13}&= (1-\exp [-2(H_{XY2}-H_{XY})])^{1/2} \end{aligned}$$
(22)

and \(f_{12}\) and \(f_{13}\) are Information Measures of Correlation (IMC-1 and IMC-2).

Having 39 GLC matrices per region from which 13 measures are computed, we will obtain 507 measures per region, and having 90 region, this makes a total of 45630 measures per patient. In one of the experiments, we have used all these measures in the classification task, therefore a strategy to perform feature selection is desired.

2.3 Feature Selection

Different feature selection methods have been proposed throughout the literature [15]. These methods use statistical measures to assess significance of the Haralick measures. To do so, either a parametrical or empirical approach can be used. In this work, we have used both approaches.

For the parametrical approach, the widely known independent two-sample t-test has been used. This test computes the t-statistic for each element in the feature vector, and then, its statistical significance can be estimated by using the t-distribution. The t-statistic is computed as:

$$\begin{aligned} t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{\sigma _{X_2}^2+\sigma _{X_1}^2}{n}}} \end{aligned}$$
(23)

where \(\sigma _{X_i}^2\) is the variance and \(\bar{X}_i\) is the average within class i.

On the other hand, we can use the Kullback-Leibler (KL) divergence to assess statistical significance, by computing the KL measure as in [24]:

$$\begin{aligned} KL = \frac{1}{2}\left( \frac{\sigma _{X_2}^2}{\sigma _{X_1}^2}+\frac{\sigma _{X_1}^2}{\sigma _{X_2}^2}-2\right) +\frac{1}{2}\left( \bar{X}_1-\bar{X}_2\right) ^2\left( \frac{1}{\sigma _{X_1}^2}+\frac{1}{\sigma _{X_2}^2}\right) \end{aligned}$$
(24)

and then perform a battery of permutation tests so that we can obtain the empirical distribution of the KL values for each feature, from which the p-values can be obtained.

2.4 Database and Preprocessing

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimers disease (AD). For up-to-date information, see www.adni-info.org.

The database used in this article was extracted from the ADNI1: Screening 1.5T (subjects who have a screening data) and contains 1075 T1-weighted MRI images, comprising 229 NOR, 401 MCI and 188 AD images. In this work, only the first session of 188 AD and 229 subjects was used. The images were spatially normalized using the SPM software [6], after a skull removing procedure.

2.5 Evaluation

We have evaluated the texture measures by means of a classification analysis. In this analysis, different sets of measures from the train set are used to train a Support Vector Classifier (SVC) [27], and performance results are estimated by testing the SVC with the test set. In this work, we have used a 10-Fold cross validation strategy [9]. In this strategy, we divide the whole dataset in 10 parts (folds), and each of them are used as a test set to the SVC when training with the remaining 9. The procedure is repeated 10 times and performance values of accuracy, sensitivity and specificity (and their corresponding standard deviations) are obtained.

We have evaluated our system in two different experiments:

  • Experiment 1: We test our system with only one feature at a time, and evaluate the performance obtained with that feature. Each feature is one texture measure at a certain distance and offset for a given region. From these performance values, we can obtain an estimation of which texture measures provide a higher discrimination.

  • Experiment 2: We pool together all values measures computed in all regions, in all directions and offset vectors. Then, we apply hypothesis testing to select the most significant measures at different p thresholds.

3 Results and Discussion

3.1 Experiment 1

Multiple performance results (90 regions, 13 texture measures, 13 offset and 3 distances) have been computed for this experiment. We cannot detail all 45630 accuracy, sensitivity or specificity values, but we examine the distribution of those using a boxplot.

Fig. 2.
figure 2

Boxplot of the distribution of the accuracies obtained using all texture measures in each region.

In Fig. 2, we show a boxplot of the distribution of the accuracy values obtained using all texture measures in each region. We can see that higher performance is obtained at the Hippocampus and surrounding regions, especially at the right Hippocampus. The parahippocampal lobes and amygdalas (especially the left amygdala) get notable results. The hippocampal and parahippocampal lobes have received high interest in the literature, since it plays a relevant role in memory. Particularly, Gray Matter loss due to neurodegeneration has been consistently reported in many works [4, 5]. This neurodegeneration probably leads to a change in some of the texture measures used here, therefore, it reveals the usefulness of texture measures in the parametrization of AD.

We will focus then on the analysis of which texture measures perform better within the Hippocampus, our area of interest. For this purpose, we pool again the performance values obtained at the right Hippocampus and plot them at Fig. 3 after grouping them by texture measure.

Fig. 3.
figure 3

Boxplot of the distribution of the accuracies obtained using different texture measures at the Hippocampus R.

In Fig. 3, one measure clearly stands out: Angular Second Moment. Other measures such as Inverse Difference Moment and Entropy obtain good, although more variable, performance.

3.2 Experiment 2

In this experiment, we test the performance that can be achieved using all measures and selecting the most significant ones by means of a hypothesis test. We have used two different strategies to assess significance: the Student’s t-test and the Kullback-Leibler (KL) divergence. Table 1 displays the values obtained for the different strategies, compared to the performance of using all measures computed at the right Hippocampus.

Table 1. Results of our system using all measures with selection, compared to performance in some regions and the Voxels As Features (VAF) approach [23].

We can se that the selection improves the system’s ability to detect changes related to AD, when compared to using the measures computed at each region, or even using the best measure at the best scoring region.

When compared to a commonly used voxel-wise baseline, Voxels As Features (VAF) [23], we can see that it clearly outperforms the typical approach where segmented GM and WM maps are used. When using the whole segmented T1-weighted image, the difference is smaller, although both the system using feature selection and the one using only one texture measure at the hippocampus still achieve better performance, but this time, providing a significant feature reduction of two magnitude orders (from more than half a million voxels to thousands of measures).

These results highlights the prospective of using texture measures to characterize per-region structural changes due to neurodegeneration in MRI images. This is a preliminary analysis of the utility of these texture measures, using the more traditional Haralick texture analysis. Other more advanced techniques have been developed in the late 90s and the 2000s, for example the Local Binary Patterns (LBP) [19], the watershed transform [12] or the orientation pyramid (OP) [21]. These algorithms have previously shown to outperform the Haralick texture features, and are valid candidates to improve AD diagnosis and its possible application to model the progression of AD, which is the real challenge.

4 Conclusions and Future Work

In this work, we have proposed a texture analysis framework for characterizing the structural changes in Alzheimer’s Disease (AD). The preliminary results that we show prove that the texture measures are an excellent descriptor of structural changes in different regions of the brain. The most discriminant regions are located in and surrounding the Hippocampus, and the discrimination accuracy obtained at these regions is close to 80%. On the other hand, we have demonstrated that pooling all the texture measures and selecting the most significant achieves higher performance than any other region by itself, which proves that there exist other regions which could play a significant role in neurodegeneration and can be characterized by texture measures. The system using texture measures provides a significant feature reduction and obtains similar and even higher performance that the common baseline Voxels As Features (VAF). In future works, we will extend this texture analysis with other texture features and apply those to Mild Cognitive Impairment (MCI) affected patients, and see whether these measures can be applied to the prediction of MCI conversion to AD and its progression.