Introduction

Alzheimer’s disease (AD), a progressive neurodegenerative disorder, is the most common cause of dementia in the elderly (Petrella et al. 2003). Early AD symptoms include memory loss and confusion with time or place, while advanced AD patients often suffer from loss of the ability to take care of oneself, communicate with others and recognize family members (Alzheimer’s Association 2010). The number of people with AD is increasing in ageing populations, which has a marked impact on healthcare systems, families and caregivers (Alzheimer’s Association 2010). A definitive diagnosis of AD relies on pathological confirmation obtained from an autopsy or biopsy which is not always available. In the developed world, the diagnostic accuracy of AD varies from more than 90% in an academic clinical setting (Cummings et al. 1998) to a substantially lower percentage in a general community setting (Petrella et al. 2003), while in developing countries, the accuracy of AD diagnosis is much lower. For example, the rate of missed diagnosis has reached more than 75% in mainland China (China Alzheimer’s Project, 2011) and often the AD patients that hospitals confirmed are severe AD cases, which leave few options for treatment. Therefore, accurate AD diagnosis is crucial prior to proper treatment—although current medical interventions cannot stop or cure this disorder, they could help retard the progression of AD symptoms.

Brain atrophy on magnetic resonance imaging (MRI) has been detected more consistently than decline on specific cognitive tests in patients with AD (Jack et al. 2004). Medial temporal lobe atrophy (especially in the entorhinal cortex, hippocampus, amygdala, and parahippocampal gyrus) is often seen in MRI images of AD patients (Jack et al. 2004; Kesslak et al. 1991; Petrella et al. 2003). The pathologic progression of AD is believed to develop from entorhinal cortex to hippocampus to neocortex (Karas et al. 2003), and pathological studies have demonstrated that amyloid plaques and neurofibrillary tangles are present in the hippocampus at the onset of AD (Delacourte et al. 1999; Petrella et al. 2003). Consequently, there is a need to develop objective, sensitive and easily-obtained imaging markers that may serve as supplements to current clinical and neuropsychologic tests to facilitate AD diagnosis (Petrella et al. 2003), especially in developing countries (Kalaria et al. 2008).

A number of approaches such as volumetric and subtraction MR techniques, shape and thickness analysis have been developed to improve the diagnostic accuracy of AD (Colliot et al. 2008; Desikan et al. 2009; Gerardin et al. 2009; Petrella et al. 2003). Classification with support vector machine (SVM) has been applied to whole brain MRI images of AD (Klöppel et al. 2008) and automated ROI segmentation on MRI images together with volumetric and thickness analysis has obtained high classification accuracy (Desikan et al. 2009). In addition, serial imaging measures have been proposed, including structural atrophy rate measured by longitudinal MRI scans and glucose metabolism changes obtained with serial positron emission tomography (PET) (Drzezga et al. 2003; Du et al. 2003; Fox and Freeborough 1997; Fox et al. 1999, 2000; Minoshima et al. 1997; Silverman et al. 2001; Small et al. 2000). A yearly decline in hippocampal volume and an increase in temporal horn volume have been found approximately 2.5 times greater in patients with AD than in control subjects (Jack et al. 1998). Further, the annual volume changes of the hippocampus, entorhinal cortex, ventricle, and whole brain in mild cognitive impairment (MCI) and AD have been studied (Jack et al. 2004, 2008).

Texture features play an important role in image analysis research and may develop into a useful clinical imaging tool. Various medical applications of texture analysis provide a quantitative means of analyzing and characterizing properties of tissues, physiological and pathological stages and reveal often invisible information on tissues of interest (Harrison et al. 2008). In neuroimaging studies, texture analysis has been used to detect lesions and abnormalities for quantifying contralateral differences in epilepsy (Namer et al. 2001), hippocampal sclerosis (Boniha et al. 2003) and cancer (Rangayyan et al. 2010), aiding the automatic delineation of cerebellar volumes (Saeed and Piri 2002), characterizing spinal cord pathology in multiple sclerosis (MS) (Mathias et al. 1999), and monitoring therapeutic response in MS patients (Zhang et al. 2003). 2D texture analysis has been applied to the classification of AD, including: 2D texture features of spatial gray-level dependence and a linear discriminant function used for differentiating AD from normal subjects (Freeborough and Fox 1998), texture features extracted from GLN (gray level nonuniformity) and RLN (run length nonuniformity) for classification (Kaeriyama et al. 2002), MRI features extracted for separability among AD, MCI patients, and matched controls (Liu et al. 2004), and texture analysis on PET brain images (Sayeed et al. 2002).

In recent years, 3D texture features have been developed which contain more spatial information (along the extra dimension) and higher sensitivity and specificity than 2D techniques (Kovalev et al. 2001). 3D texture features based on co-occurrence matrices (COM) obtained separation between MCD (MCI or mild AD) and controls (Kovalev et al. 2001). It was also found that 3D texture analysis was a promising supplement to the current techniques for diagnosing autism (El-Baz et al. 2007). 2D and 3D textures were compared in classification of AD. It was found that 3D texture in the hippocampus was better than 2D texture, which could be an early indicator of AD (Kumar et al. 2005).

We hypothesized that the options in 3D texture analysis including regions of interest (ROIs) selection, feature extraction and selection, statistical analysis and classification are important to AD classification and the 3D texture analysis processing pipeline could be optimized to improve the accuracy of AD classification. In this preliminary study, we investigated the effects of these options on the accuracy of distinguishing AD from healthy subjects in order to improve AD classification and obtain a useful aid for AD diagnosis.

Materials and methods

Subjects

17 AD patients (8 males and 9 females, mean age 65.6 years with a range from 51 to 82 years) and 17 age- and gender-matched healthy controls (6 males and 11 females, age 65.2 ± 7.8 years with a range from 51 to 84 years) were included in this study. All participants underwent a clinical screening procedure including Mini-Mental State Examination (MMSE) (Folstein et al. 1975) and Clinical Dementia Rating (CDR) (Morris 1993) scores. All AD patients met National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association criteria for AD (McKhann et al. 1984) with (Chinese-version) MMSE scores < 15 and had no visible lesions on conventional MRI. To increase the likelihood of making a correct AD diagnosis, patients recruited in this study had moderately severe or severe dementia, based on CDR scores of 2 or 3, respectively. For normal controls, the inclusion criteria were: a normal neurological examination, a CDR (Hughes et al. 1982; Morris 1993) scale score of 0, normal cognition (MMSE score > 27), no history of neurologic and psychiatric conditions, and normal conventional MRI examinations. All participants were recruited and evaluated by Xuanwu Hospital, Beijing. Local ethical committee approval and written informed consent from these participants were obtained before initiating this study. The descriptive information for these subjects is listed in Table 1.

Table 1 Descriptive information for the subjects in this study

MRI acquisition

T1-weighted MR images of the 17 AD patients and 17 matched normal controls were acquired with a 3 T MRI scanner (Trio, Tim, VB15, Siemens, Erlangen, Germany) using a three dimensional (3D) magnetization prepared rapid gradient echo (MP-RAGE) sequence, and a multi-channel phase array head coil (12 elements) was equipped with the maximum gradient strength of 40 mT/m and a maximum slew rate of 200 Tm–1 s–1. The MRI scans were conducted in the Department of Radiology, Xuanwu Hospital in Beijing. The scan parameters are as follows: TR = 2,000 ms, TE = 2 ms, Inversion time (TI) = 900 ms, Flip angle = 9°, matrix = 256 × 224, FOV = 256 × 224 mm2, slice thickness = 1 mm, slices = 176, bandwidth = 210 Hz/Pixel, and pre-scan normalization was selected to correct B1 inhomogeneity. In this study, gradient unwarping was not performed and parallel imaging technique (PAT) was not adopted in order to maintain higher signal-to-noise ratio (SNR).

ROI selection

3D ball-shape ROIs were placed in the left and right temporal regions of the hippocampus and entorhinal cortex on the MR image of each subject with the MRI texture analysis software MaZda (version 4.6) where the location and size of the ROIs could be manually adjusted and the ROIs could be saved for later use. MaZda was developed at the Institute of Electronics, Technical University of Lodz (TUL), Poland (Szczypinski et al. 2009). 3D ROI capability was the new feature of version 4.6 MaZda. In order to investigate the impact of ROI selection on the accuracy of texture analysis, 3D ball-shape ROIs were placed in three ways (named as Type I, II, and III ROI) for each subject: Type I ROIs with a size of 0.08 in MaZda (equivalent to around 2,226 voxels) and a radius of 8.1 pixels were placed in the regions of the hippocampus and entorhinal cortex including part of adjacent cerebrospinal fluid (CSF) (Fig. 1a); Type II ROIs with a size of 0.05 in MaZda (equivalent to around 463 voxels) and a radius of 4.8 pixels were placed within the hippocampus and entorhinal cortex (Fig. 1b); Type III ROIs with a size of 0.03 in MaZda (equivalent to around 165 voxels) and a radius of 3.4 pixels were placed in the central part of the hippocampus and entorhinal cortex (Fig. 1c). The 3 types of 3D ROIs were placed by one of the authors (J.Z., with some neuroanatomical expertise in the hippocampus and entorhinal cortex).

Fig. 1
figure 1

Illustrations of ROI selection in the hippocampus and entorhinal cortex. a Type I ROI (size = 0.08) (relatively large ROI that might include adjacent CSF). b Type II ROI (size = 0.05). c Type III ROI (size = 0.03)

3D texture analysis

Compared with 2D texture, 3D texture increases the dimensionality while keeping the rotation and reflection invariance (Kovalev et al. 2001). Over 100 3D texture features were extracted from image histogram, gradient, co-occurrence matrix (COM) (Haralick et al. 1973), and run length matrix (RLM) (Galloway 1975) in the ROIs using the software MaZda (Strzelecki et al. 2006). The texture parameters used in the analysis are listed in Table 2. Default parameter setting of MaZda was used and the number of bits per pixel used was: 12 for histogram features, 6 for gradient features, 6 together with the 1 distance between pairs of pixels for COM features, and 6 bits per pixel for RLM features.

Table 2 Texture parameters used in the analysis

Feature selection was performed with 4 approaches: Fisher, classification error probability (POE) and average correlation coefficients (ACC) (POE+ACC), Mutual Information, and MI_PA_F (a combination of mutual information, POE+ACC and Fisher). The Fisher approach is widely used in multivariate analysis. In the MaZda program, it selected 10 features (extracted by different feature extraction approaches such as COM and RLM) based on maximizing the Fisher coefficient, i.e., the ratio of between-class variance to within-class variance (Schürman 1996):

$$ F = \frac{D}{V} = \frac{{\frac{1}{{1 - \sum\limits_{{k = 1}}^K {P_k^2} }}\sum\limits_{{k = 1}}^K {\sum\limits_{{j = 1}}^K {{P_k}{P_j}{{\left( {{\mu_k} - {\mu_j}} \right)}^2}} } }}{{\sum\limits_{{k = 1}}^K {{P_k}{V_k}} }} $$
(1)

where F is the Fisher coefficient, D is the between-class variance and V is the within-class variance, μi and Vi are the mean and variance of class i, and Pi is the probability of class i.

The POE+ACC approach selected 10 features based on minimizing both POE and ACC between chosen features (Mucciardi and Gose 1971; Dash and Liu 1997). The mutual information approach selected 10 texture features with largest mutual information coefficient (i.e., the largest dependence between features and class categories). The MI_PA_F approach selected 30 texture features in total for classification.

Statistical analysis and classification

To identify the impact of different analytic methods on the accuracy of classification, we first took the 34 MRI images as a whole, extracted texture features, and reduced the features with the Fisher approach. Then, we applied the raw data analysis (RDA), principal component analysis (PCA), linear discriminant analysis (LDA) with 1-NN (1-nearest neighbor) classifier for classification, and applied non-linear discriminant analysis (NDA) with artificial neural network (ANN) classifier for classification. The analysis was performed using B11 (version 3.3) of MaZda companion software. The results were considered an estimate of the classification accuracy for the whole data set (treated as the training set).

The MRI images were then split into a training set and a test set. 20 MRI images (10 AD patients and 10 controls) were randomly assigned to the training set and the remaining 14 MRI images (7 patients and 7 controls) were assigned to the test set. In our previous study on texture analysis of multiple sclerosis (MS) (Zhang et al. 2008), we found that the classification performance of 1-NN was lower than that of artificial neural network (ANN). Thus, the classification on the test set was performed with the non-linear ANN classifier in B11.

In order to test whether the 3D texture features are correlated with clinical measures, Pearson correlation analysis was performed with statistical software SPSS (13.0) on the 10 selected texture features (after feature selection with Fisher approach) and the MMSE scores.

Results

Texture analysis using all MR images as the training set

When using the whole data set as the training set, the 1-NN classification accuracy of the training set for raw data analysis (RDA), PCA and LDA is relatively low (63.2–89.7%), but the ANN classification accuracy is relatively high for NDA (92.6–98.5%) (Table 3). Regardless of the different analytic approaches (RDA, PCA, LDA or NDA), higher classification accuracy was obtained from the training set of Type I ROI compared with those of Type II and III ROIs (Table 3).

Table 3 Texture analysis
Table 4 Texture analysis

Classification results of the test set

Among the four feature selection and reduction approaches, the Fisher approach tends to generate relatively high classification accuracy regardless of ROI selection. The POE+ACC approach leads to the same high classification accuracy (96.4%) as the Fisher approach for Type I ROI (Table 4). Although the MI_PA_F approach used more features for classification than other feature selection approaches, it did not generate a more accurate result, but relatively low classification accuracy. Compared with Type II and III ROIs, the classification accuracy for Type I ROI is much higher for the test set, which is consistent with the observations obtained from the training set.

Correlation analysis

The results of correlation analysis between 3D texture features and the MMSE scores showed that most of the texture features were significantly correlated with the scores of MMSE. In order to analyze the correlation results (for possible trend), 4 texture features (after feature selection and reduction) that were in common with 3 types of ROIs and significantly correlated with MMSE scores were selected and their correlation results are summarized in Table 5. Among them, three COM texture features were negatively correlated with the MMSE scores (Table 5), while one COM texture feature was positively correlated with the clinical scale (Table 5). These findings suggest that the 3D texture features obtained from structural MR images are correlated with the severity of cognitive impairments of AD that the MMSE scores reveal.

Table 5 Pearson Correlation between selected texture features and MMSE scores

Discussion

In this study, we examined 3D texture features of MRI images extracted from regions of the hippocampus and entorhinal cortex in AD and calculated the correlations between the texture features and the clinical measure of MMSE. Our preliminary results show that 3D texture analysis could characterize the differences of texture features in the tissues of ROIs in AD patients and normal controls.

Abnormalities in the medial temporal region in AD patients

3D texture analysis on MRI is performed by analyzing the gray tone variations among image voxels in the 3D ROI which captures the spatial and intensity information from the abnormalities of brain tissue in brain diseases. Since the voxel size of the MRI images in this study is 1 mm and the parameters of 3D texture analysis were set to no less than 6 bits per voxel for texture extraction approaches (histogram, gradient, COM or RLM), 3D texture is able to detect subtle differences between the MR images of AD patients and those of controls. In this study, the differences of 3D texture between AD patients and normal controls in the hippocampus and entorhinal cortex reflect the abnormal spatial texture content or abnormalities in the medial temporal region in patients with AD, compared with normal controls. These abnormalities are characterized by the appearance of extracellular amyloid plaques and intracellular neurofibrillary tangles (Petrella et al. 2003) which disintegrates microtubules, collapses the neuron’s transport network, and damages the function of the neuron as well as the communication between neurons (NIA 2009). Accompanied by the progress of AD, widespread neuron death (or loss) leads to brain atrophy.

This study shows that ROI selection plays an important role in texture analysis. A relatively large ROI including part of the CSF near the hippocampus and entorhinal cortex generates a higher classification accuracy, while a smaller ROI within the hippocampus and entorhinal cortex generates a much lower classification accuracy. This may be because more distinctive 3D texture features could be extracted from the large ROIs (that includes part of the brain surface) which helps distinguish brain tissues of AD patients from those of normal controls.

The correlation results indicate that most 3D texture features correlated with the MMSE scores, supporting the findings that AD atrophy rates of hippocampus and enthorhinal cortex were significantly correlated with the scores of MMSE (Jack et al. 2004). It was found that progressive functional decline was correlated with MMSE scores over the course of AD as follows: MMSE scores of 20–23 correspond to mild AD (e.g., short-term memory loss), 10–19 correspond to moderate AD (e.g., daily cognitive function impaired), and 0–9 correspond to severe AD (e.g., behavioral disturbance) (Petrella et al. 2003). Further, the onset and progression of cognitive symptoms in AD patients are thought to parallel the pathologic progression of AD-related brain destruction (Karas et al. 2003). Consequently, the correlation results may further suggest that the abnormalities in structural imaging detected by 3D texture are correlated with the severity of cognitive impairment of AD that the MMSE scores represented.

Texture analysis as a data processing pipeline

From the perspective of data processing, texture analysis is a multi-step data processing procedure or pipeline which consists of ROI selection, feature extraction and selection, and classification. The final result of texture analysis, i.e., classification accuracy, is not determined by any single step, but the combination of the options and parameters selected for each step.

ROI selection plays an important role in texture analysis. Several MRI-based ROIs (including the hippocampus, the entorhinal cortex, the ventricle and even the whole brain) have been selected in AD. Due to the characteristics of AD progression, each ROI has a different atrophy rate which reveals different levels of damage in the microstructures within it (Jack et al. 2004). This may lead to different texture features extracted from the ROIs and selected for classification, and thus different classification accuracy in texture analysis. The results of this study indicate that the texture analysis and classification results were very different when the ROI was selected in different parts of the hippocampus and entorhinal cortex region and for better texture analysis and classification, the key to ROI selection is to choose the ROI that can maximize the texture feature differences between AD and normal controls.

Feature extraction and selection obtain the texture features from the ROI and determine which features to be used for classification. In our previous 2D texture analysis of multiple sclerosis (MS) (Zhang et al. 2008), we compared the 16 features extracted from gray-level co-occurrence matrix (COM) alone and the combined 27 features extracted from 5 different feature extraction approaches: gradient matrix, COM, RLM, autoregressive model, and wavelet analysis. We found that the classification accuracy was higher with the combined texture features in more cases compared to those of COM alone. However, the results of this study demonstrated that the classification with the combined set of 30 features was relatively low compared with the 10-feature sets selected by individual feature selection approaches. This indicates that a combined feature set with many texture features may not lead to high classification accuracy. In other words, classification accuracy is not dependent on the number of features selected or whether they are combined from different feature extraction or selection approaches but on the combination of the options and parameters of each step in the texture analysis pipeline.

For data analysis and classification, the results demonstrated that NDA with ANN classifier performed much better than RDA, PCA or LDA with 1-NN classifier measured by both Fisher coefficient and classification accuracy (Table 3), which indicates that NDA with ANN classifier had more discriminative power than the other approaches with 1-NN classifier in analyzing the data of this study. This is consistent with the results of our previous study for MRI texture analysis of MS (Zhang et al. 2008). These results revealed that (1) the data used for both studies were probably comprised of linearly non-separable components which need non-linear hypersurfaces rather than linear hyperplanes to separate; and (2) RDA (without data transformation), PCA, or LDA (with linear transformation) could not classify linearly non-separable data components while NDA transformed the data with a non-linear transformation and made the data separable in a non-linear hyper space (Strzelecki et al. 2006; Szczypinski et al. 2009).

Taken together, such a texture analysis pipeline could be applied to different data sets (MR images of MS, AD, MCI, etc.), and the overall performance of the texture analysis pipeline is determined by the combination of the options and parameters of each step. Consequently, optimization of the texture analysis pipeline to obtain the best classification accuracy could be achieved by fine-tuning the options and parameters of each step along the pipeline.

Methodological issues

It is believed that there is no need for normalization in the case of characterization or comparison of small, equal-sized ROI volumes (Kovalev et al. 2001). In addition, normalization could distort the ROIs of the MR images and destroy the 3D texture. In this study, since the ROIs in the hippocampus and entorhinal cortex were small (compared with the whole brain), in order to preserve the delicate 3D texture of the ROIs, the MRI image of each subject was kept in their own space and did not register to the standard brain for normalization. Experiments regarding normalization could be conducted in our future studies in order to understand its impact on ROI distortion and 3D texture.

In addition, there are some limitations in this study. First, one limitation of this study is the semi-automated ROI selection. The study shows that placement of ROI is critical. Although this ROI selection approach with MaZda does not need one to manually trace the 3D ROI (slice by slice), it requires manually adjusting the location and size of the ball-shape ROI with the tools provided by MaZda. Although this technique is quicker than manual ROI-racing, it is still time-consuming and lacks reproducibility. Since reproducibility is a very important criterion for an imaging clinical tool, further research that includes placement of the ROI by an imaging specialist such as a neuroradiologist would be important and future research should include inter and intra observer variability measurements. In addition, automatic ROI segmentation and selection of brain structures such as hippocampus has been explored by several recent studies (Colliot et al. 2008; Desikan et al. 2009; Gerardin et al. 2009) and is promising in standardizing ROI selection and improving reproducibility. Hopefully, tools for automatic ROI selection will be provided by future MaZda which will make 3D texture analysis more practical, automated and convenient for possible future usage in clinical settings. Second, the preliminary results of this study are encouraging and should be investigated further. For example, sample size could be enlarged by recruiting more subjects in our future studies. In addition, cross-validation or a leave-one-out approach would be employed to improve classification and use the sample more effectively. Further, a comparison study between texture analysis and more established approaches such as volume and shape analysis could be performed in the future, and a mild or moderate AD group or an MCI group could be included to further test the 3D texture analysis and classification approaches. Finally, the discrimination power of AD (or MCI) abnormalities of 3D texture might be improved by combining the strengths of texture analysis with conventional approaches such as volume and shape analysis.

In summary, texture features play an important role in image analysis research and may develop into a useful clinical imaging tool. Texture features are not yet clinically used but it is an important technique to investigate. In this study, we investigated 3D texture analysis on MR images of AD. We found that the classification accuracy of texture analysis in the regions of the hippocampus and entorhinal cortex varied from 64.3% to 96.4% due to different ROI selection, feature extraction and selection options. In addition, we found that most 3D texture features selected were correlated with the MMSE scores. These indicated that 3D texture analysis could characterize the tissue difference between AD and normal controls, and the 3D texture features extracted from structural MRI images could relate to the severity of AD cognitive impairment. These results suggested that 3D texture might be a useful aid in AD diagnosis. The findings of this preliminary study revealed some trends in the 3D texture analysis of AD, and more work needs to be done in the future to make it a truly useful supplement for AD diagnosis, meeting the demand for improving AD diagnosis, especially in developing countries.