1 Introduction

Alzheimer’s disease (AD) is a disorder that slowly progresses which causes neurons to degenerate. AD is commonly observed to be kind of persistent mental illness (called dementia), reporting 60% to 80% of entire dementia diagnoses. It is a complicated ailment classified by the deposition of \(\beta \)-amyloid and disorganized microtubules of neurons [1], which are constituted of \(\tau \)-amyloid filaments associated with synapse failure or loss, and slow degeneration of neurons causes a loss of memory and diverse cognitive impairments. The pathologically functional variations that result in disabilities on psychic, function and behavior are considered to occur few years or possibly a decade before appearance of any associated medical signs. AD is commonly identified among individuals whose age above 65, and it is reported that AD-affected population is likely to increase twice five-yearly once the individual crosses age of 65 [2]. By 2050, it was estimated that one individual among 85 is likely to prone by AD [3]. The mean half-life recovery period of AD will be from 3 to 10 years based on the age at what time the AD was diagnosed. The central life time is so far as between 7 and 10 years among AD cases where their illness were detected while they were in late 60s or the beginning of 70s. This value is likely to decrease by 3 years for diagnosed AD cases, while their ages were 90s [4]. Lately, quite accurate detailed guidelines were suggested for prompt and effective AD diagnoses in preliminary condition or mild cognitive impairment (MCI) [5], be of considerable significance for early care and delaying the occurrence. MCI is usually referred that AD patients will have cognitive decline (memory slip) without any significant impact on activities of everyday lives [6, 7].

Commonly, MCI individuals will exhibit two kinds of medical variations. Firstly, certain MCI individuals may finally progress into AD patients, MCI leads to AD (mAD), termed as MCI converters, while the rest of them will not progress to AD and show stabilized MCI (aAD), termed as MCI non-converters. Nearly 35 MCI patients out of 100 may progress as dementia or AD patients in a 3-year timeline with annual rate of conversion about 5 to 10%. Both mAD and aAD individuals are discriminated based on intense of amnesic disability. An aAD is identified based on amnesic test result more than 1.5 times standard deviation and lower normalized value on memory test, and mAD based on amnesic test result ranges from 1 to 1.5 times standard deviation lower normalized value [8, 9]. aAD individuals will have moderate verbal memory disability in comparison with mAD and are considered to typify an early onset of the impairment which will be best possible in modifying the remedies for disease. Both MCI sub-types variation may have impact for variations in biomarker anomalies, possibly clinical history and health reaction [8]. Due to diversified medical examination and basic causatives among MCI patients, there exists no proper treatment procedure for MCI patients. Moreover, MCI clinical history is more diverse rather typical. Still, many factors may induce MCI, but every factor cannot cause associated neural degeneration. Hence, diagnostics of basic cause is really difficult for persons with cognitive decline and it is essential for quite effective diagnostics to recognize MCI individuals with AD as cause. The diagnostics must be carried out during neural degeneration without delay [10]. This issue of qualitative diagnosis can be addressed through the categorization between mAD and aAD. Since AD is a continuing neural degeneration, consistent variations between already estimated and recent clinical outcome values, that is, Mini-Mental State Examination (MMSE) and AD Assessment Scale Cognitive Subscale (ADAS-Cog). Hence, it is crucial to anticipate future clinical outcome values on the basis of information from previous instants which is specifically supportive for controlling development of the disease. Still, the categorization of AD is difficult and standard outcomes exhibited less accuracy because of complexity of patterns of cortical thickness in aAD patients [10, 11]. This constraint is overwhelmed by rising the number of samples for training to include whole complicate patterns or by selecting useful characteristics to consider the variations between both populations. Recently, the big steps in brain scanning gave chances to analyze the neural associated disorders which enhances for timely and precise intervention of AD [12,13,14]. Magnetic resonance imaging (MRI) is increasingly popular to carry out AD associated examinations due to its noninvasive procedure and no pain to the patients. Further, MRI contributes a good contrast and a fine structural resolution. Thereby, several investigations have applied structural MRI (sMRI)-supported biological indicators to sort AD populations [15,16,17,18,19,20] which denotes size variation on brain tissues and neural degeneration.

Fig. 1
figure 1

Proposed architecture

Likewise, functional MRI (fMRI) is employed to describe the physiological reaction pertinent to neurological performance and thus establishes connectivity between functional and structural [21, 22]. This permits one to explain neural disorders of entire brain by connectivity. Here, in this research sMRI is used for classification of AD. The severity and grade of neural degeneration are recognized with the assistance of deterioration assessed from sMRI [23]. Therefore, sMRI-supported extraction of features has captivated the interest among investigators to perform classification of AD. These findings comprise morphological measurements processes, namely volume of interest (VOI) of grey matter voxels for automated segmenting the images [24] and the measures of the medial temporal lobe and the hippocampus [25]. Many machine learning methods were applied to distinct AD patients from old-aged intervention group employing diverse biological markers. The customary classifiers comprise artificial neural network (ANN), support vector machine (SVM) and various ensembling of classifiers. Inter alia, SVM and its alternative versions were fully examined because of its quite best accuracy and capacity to handle higher-dimensional data.

Table 1 Information of ADNI subjects used in training dataset

2 Literature Survey

A SVM kind of classifier proposed by Magnin et al. [26] starts by learning method in training the dataset comprising clearly distinguished cases with known conditions; that is, the labels about cases are marked. Subsequently, the classifier objects in maximizing the data margin by developing the optimum partitioning of hyperplane or group of hyperplanes over one or multi-dimensions. Classification is carried out for testing data on the basis of learnt hyperplanes at a testing level. Broadly, three-dimensional T1-weighted MRI of every case was parcellated automatically into ROIs. From every ROI, the grey matter will be extracted as depicted in Fig. 1, as one of the feature in the process of classification. A classification of multimodal method proposed by Zhang et al. [15] used multi-kernel SVM on biological markers comprising sMRI [20] and cerebrospinal fluid (CSF) [27] to distinguish AD or MCI from healthy cases. From the outcomes of binary classification, which is between AD and healthy or MCI and healthy, their suggested procedure may derive a best accuracy in classification of both AD and MCI, respectively. Liu et al. [28, 29] suggested a deep learning technique for classification of multiple class among healthy cases, MCI non-converters, MCI converters and AD cases on the basis of eighty-three ROI of sMRI images and the appropriate recorded PET scans. Stacked auto-encoder (SAE) is applied for unsupervisory learning to derive top level elements, and subsequently logistic regression of softmax is used in the classifier. Whereas the experimental outcomes appeared fairly well achievement, it is, however, debating that the de-noising in SAE may become problematic in appropriate feature learning. Hence, it has been troublesome to use practically. Li et al. [30] intended accurate adjustment of novel features on the basis of principle component analysis (PCA), stable preference, dropout and multiple task learning, in which restricted Boltzmann machine (RBM) concept was applied as a deep-learning model. Ninety-three number of ROIs from PET and MRI, jointly using CSF biological markers, are utilized. Ye et al. [31] presented a theoretical machine learning-introduced multiple modal data unification method employing PET, MRI, CSF, genetics, CSF, demography for AD-associated analysis and functional connectivity study. In recent days, Rama et al. [32] suggested import vector machine (IVM)-dependent technique for multiple-label classification, in which just the subgroup of characteristics from sMRI were the inputs to kernel-based logistic regression, thereby decreasing the computing expense. This approach employed complete ROIs of 65 in number as features to train and test and attained the maximum accuracy of 70% when sorting AD, MCI and healthy and 76.9% for biform classification of AD and healthy [32]. An algorithm of joint regression and classification (JRC) was presented by Zhu et al. [33] and has shown an effective procedure in diagnosing AD or MCI on the basis of multiple modalities images. The weight basis multiple modality sparse characterization-dependent classification (mSRC) was evolved by Xu et al. [34] and used it to distinguish AD or MCI on the basis of multiple modality images. A supervisory internally class-homogeneity discriminative dictionary learning (SCDDL) was widened in a kernel structure, because multiple feature kernel learning algorithm (MKSCDDL) was proposed to effectively fuse features by Gonen and Alpaydin [35] and ultimately it was shown as efficient tool in detecting faces by Wu et al. [36]. Vierra et al. [37] successfully used deep learning method to ADNI dataset by automatically inferring features as an optimal characterization from real images with no need of any preselection of features, leading to a more substantive and objective process. When many methods have been suggested in classification of various AD levels, using comparatively lesser data, it is quite tough to obtain influential data. The work proposed here aims at introducing effective classification method operating firmly over smaller dataset. For the purpose, an effective feature selection method of SegNet and a residual network (ResNet) for the multiple-label classification of three progressive AD stages are adopted.

3 Research Materials

3.1 Structural MRI Dataset Augmentation

Datasets employed here were accessed from the database of Alzheimer’s disease neuroimaging initiative (ADNI) (http://adni.loni.usc.edu/). The database of ADNI was commenced in the year of 2003 in a joint association of government and private sectors. The main purpose of ADNI database creation is to research openly in finding out the series PET, MRI, PET, biomarkers and clinical and neurological-physical evaluation in combination with the measures about the development of MCI and onset AD. The multiple-label classification for normal, MCI and AD is performed. For the selected cases as given in Table 1, the features listed in Table 2 were extracted from sMRI for the research. Segmented brain regions chosen for the classification as features are depicted in Fig. 3. Every region is distinguished from other tissues transversal view of sMRI. The sMRI employed in this paper was obtained from 1.5T CT scanners.

3.2 ADNI Participants

The ADNI dataset comprises in excess of 6000 cases with the age between 18 and 96. From it, 240 cases are chosen for training. The chosen cases fulfilled the requirements stated in the ADNI propriety. An evenly handed dataset is constructed comprising 240 cases in the following manner:

  1. (1)

    80 normal cases: 40 females, and 40 males with age range 71–88 years and 71–86 years and age ± SD= 78.8 ± 4.8 years and 77.3 ± 4 years, respectively, and with mini-mental state estimation (MMSE) score of 28.3 ± 1.3, in the range between 26 and 30.

  2. (2)

    80 MCI cases who had not converted to AD within 18 months: 40 males, 40 females; with age range 65–82 years and 67–87 years and Age ± SD= 73.63 ± 5 years and 78.03 ± 5.8 years, respectively, and with MMSE score of 26.4 ± 1.6, in the range between 25 and 30.

  3. (3)

    80 AD cases: 40 females, 40 males with age range 70–91 years and 72–92 years and age ± SD= 79.8 ± 6.98 years and 79.03 ± 5.47 years, respectively, and with MMSE score of 22.4 ± 2.6, in the range between 17 and 27.

Fig. 2
figure 2

SegNet architecture [44]

Fig. 3
figure 3

SegNet encoder architecture [44]

Table 1 depicts some of the characteristics of chosen cases. Entire sMRIs employed here were obtained from 1.5T scanners. The primary aim here was to detail the supervisory multiple-label classification within normal, MCI and AD cases on the basis of proposed architecture, SegNet + ResNet 101 classifier as depicted in Fig. 1. Hence, to acquire impartial estimates of the performance of the classifier, the chosen cases were divided at random into 2 sets as training and testing dataset. The model was trained using training dataset, and the diagnostic performances like accuracy, sensitivity and specificity were assessed separately with testing dataset. Both training and testing datasets used uniform ages and equal number of genders.

4 Proposed Architecture for Feature Extraction and Classification

Segmentation of ROI of grey matter, white matter, hippocampus, cerebrospinal fluid and cortical thickness is considered, which are basis for feature extraction and measure gyrification index too. Whole brain regions are not considered as ROI for feature extraction in diagnosing AD clinically. Since certain features are not distinctive and are not supportive in diagnosis, possibly they may influence the AD detection accuracy. SegNet network is adapted and trained independently to segment some of the brain regions from sMRIs. The initial layers of the SegNet learn basic features such as edges and circles, whereas the inner layers learn more complex and beneficial delicate features. The machine learnt features by the final layer of de-convolution in SegNet amount to various segment images, associated with the three classification labels (CN, MCI and AD). The properties of every segmented image constitute those strong features about labels that comprise whole graded features, benefitting to improve the performance of classification. To additionally enhance the feature data for classification, a characteristic vector is created on the basis of combining the pixel intensity values of feature segmented images. Eventually the characteristic vector is fed to a ResNet-101 classifier to classify every sMRI image into presence and absence of AD or MCI. The rationale behind applying ResNet as a classifier here is its high performance for classification using attributes [38].

Table 2 Feature and feature measure

4.1 SegNet Architecture

SegNet comprises an encoder and a respective decoder architecture, continued with an image classification using ResNet. This SegNet network construction is depicted in Fig. 2. Encoder comprises 13 convolution layers that fit to the initial 13 convolution layers as like in VGG16 [39] framed for classification of objects. Hence, the training process is initialized using trained weights for the classification on larger sets of data [40]. Further, the fully connected layers (FCN) are discarded on behalf of preserving highly resolved features maps deep down at the output of encoder. It too decreases the amount of parameters in the encoder of SegNet substantially, from 134 M to 14.7 M, in comparison with other newly neural structures [41, 42]. Every layer of encoder comprises respective decoder, and thus, the decoder too contains 13 number of layers. The eventual output of decoder is applied to a multiple-label softmax classifier to generate group probabilities separately.

Every encoder as shown in Fig. 3 carries out convolution operation using filter banks to generate a group of feature maps. Then, they are applied with batch normalization [43, 44]. Subsequently, a feature-wise rectified linear nonlinearity (ReLU) is used. After this, 2\(\times \) 2 max-pooling window and non-overlapping window of stride 2 are operated and the resultant outcome is down sampled with a factor of 2. Max-pooling is applied to attain shift invariability over lower spatial translations in the image applied as input. Downsampling provides a huge number of contexts, spatial windows, of input image for every feature in map. When many layers of max-pooling and downsampling can attain more shift invariability for powerful classification proportionately, there will be drop in space resolution in feature maps. The growing loss image model on edge descriptions is disadvantageous for segmenting when edge definition is crucial. Hence, it is required to acquire and stock edge details in the feature maps of encoder prior downsampling is operated. When memory due to interpretation is unconstrained, subsequently whole feature maps of encoder once downsampling can be preserved. It is generally no accident in actual implementations, and thus, a quite effective approach is applied to preserve this detail.

This includes preserving just the indices of max-pooling; that is, the positions of the maximal value of feature of every pooling grid are stored for every feature map of the encoder. As a rule, it is achievable by employing 2 bits for every 2\(\times \)2 window of pooling and is therefore far more effective to preserve in comparison with storing feature map in floating point precision. The smaller memory provides an insignificant accuracy loss, but is quite appropriate for real implementations. The suitable decoder in the decoder architecture upsamples the feature map of input by employing the stored indices of max-pooling from the respective feature map of encoder. This procedure generates sparse feature map. This SegNet upsampling method is demonstrated in Fig. 4. The maps of features are subsequently applied with convolution using filter banks for decoder that capacitated to learn to generate densely feature maps. These maps are then batch-normalized as subsequent step. The decoder in accordance with the initial encoder which accepts the input image results in a multiple pipeline feature maps, although its encoder input image comprises RGB channels. It is distinct that the rest of the decoders in the architecture generate feature maps of equal amount of channels and size identical to inputs of encoder. The higher-dimension feature scheme at the output of the last decoder is applied to a tractable softmax classifier. This softmax categorizes every feature separately. The predicted segmentation corresponds to the class with maximum probability at each pixel.

Simple SegNet comprises encoders and decoders of 4 in number each. The encoders in simple SegNet carry out max-pooling and downsampling, and the associated decoder’s upsampling of its input employs the collected indices of max-pooling. Batch normalization is applied to every convolution layer of the encoder and decoder. Biases are not employed after convolutional operations, and nonlinear ReLU is absent in the decoder. Moreover, a constant 7 \(\times \) 7 kernel for entire layers of the encoder and decoder is selected to give out smooth labeling under broader setting, i.e., a feature in the profound feature map layer (that is 4th layer) can be track down to a background window in the input image with pixels of 256\(\times \)180. Such simple SegNet permits one to investigate several distinct variants and train them in lesser time span.

Fig. 4
figure 4

SegNet upsampling [44]

4.2 ResNet-101 Architecture

Every applied layer of ResNet [38] uses identity mapping as a major component. Therefore, when new additional layer is trained into an identity function \(f(x) = x\), the novel representation will be more efficient compared to initial scheme. As this novel representation may be a best option to accommodate the train dataset, the additional layer could do it simpler to decrease the errors on training. Further, the identity mapping instead of zero, \(f(x) = 0\) is the most simple function into a layer. Such thoughts are quite complex; however, they resulted in an unexpectedly easy solution, a residual block. This design had an enormous effect on building deep neural networks.

Considering a part of the neural network, as described in Fig. 5, x is specified as the input. It is assumed that the ideal function that desires to achieve during learning is f(x) and this will be applied as an input to the activation function. The box of dashed line depicted in the left image shown in Fig. 5 should exactly accommodate the function f(x). It may be complex when it is unnecessary that specific layer and it is necessary better withhold the input x. The box of dashed line depicted in the right image shown in Fig. 5 at present just requires to setup through parameters the variance from the identity, because it returns \(x + f(x)\). Actually, mapping the residual is quite simple for optimization. So it is just required to make \(f(x) = 0\). The right-side image in Fig. 5 demonstrates the simple ResNet residual block.

Fig. 5
figure 5

The comparison between a regular block (left-side) and a residual block (right-side). In the last instance, the convolutions are shorted [45]

ResNet includes full 3 \(\times \) 3 convolution layer scheme of VGG. The residual block contains 3 \(\times \) 3 convolution layers of two in number with the equal amount of output lines. Every convolution layer subsequently comprises a layer of batch normalization and a ReLU as activating function. Subsequently, these 2 convolution performance are skipped and sum the input straightaway prior the ultimate ReLU activating function. This type of scheme, depicted in Fig. 6, requires that the outputs of the 2 convolution layers must have identical size as input; thus, they can be summed up. When likely to alter the size of output or the stride, an extra 1 \(\times \) 1 convolution layer must be introduced to change the input into required size to perform addition operation.

Fig. 6
figure 6

Left: regular ResNet block; right: ResNet block with \(1 \times 1\) convolution [45]

Table 3 ResNets comparison [46]

When the shorting passes over feature maps of two in size, it is carried out with 2 stride. Every block of ResNet is with either 2 deep layers (usually employed in smaller architectures such as ResNet 18, 34) or 3 deep layers such as ResNet 50, 101 and 152. Every 2-layered block is inserted with this 3-layer bottleneck block in the 34-layer network, generating a 50-layer ResNet (refer to Table 3). ResNets constructed using 101 layers and 152 layers use more blocks of 3 layers (refer to Table 3). In spite of increasing the depth, ResNet of 152 layers which has 11.3 billion FLOPs has low-complexity compared with VGG-16 or VGG-19 nets which has 15.3 or 19.6 billion FLOPs. Table 3 describes different ResNet architectures.

5 Experiments and Results

5.1 Experimental Setup

In order to evaluate the performance of proposed architecture in classifying Alzheimer’s disease into CN, MCI and AD from test sMRI, considerable experiments are carried out. The suggested architecture will be selecting both the elementary and class-matched features. The experiments are carried out using the area, length and thickness voxel features. Specifically, whole dataset is shuffled in random and initially employed for segmenting. The segmenting is carried out by using sMRI training set to obtain segmented feature images using SegNet. Then, for training the classifier, entire SegNet segmented feature output images are utilized for classification using ResNet-101. Notably, the training of the ResNet-101 classifier is independent of feature extracting models.

The performance of SegNet + ResNet-101 was verified and assessed on controls and subjects, with major classification of three labels: CN, MCI and AD. For feature extraction, the SegNet was used firstly with ADNI dataset and for the classification of three labels, ResNet-101 was subsequently used with ADNI dataset. From Fig. 7, each classification comprised three stages: (i) training, (ii) validation and (iii) testing. The sMRI featured image data were randomly grouped into a larger training set (total of 216 images, 72 of each label) and smaller validation set (total of 24 images, 8 of each label). The new sMRI was used as a testing set (once again 80 images). Augmentation of features was used on opted segmented images for training and validation for the purpose of generating further synthetic features and thus avoiding over-fitting, which can happen while a fully connected layer captures several parameters. Supplying a classifier with increased training and validation data can decrease over-fitting problem. Feature augmentation approach comprised segmenting the regions and contouring the segmented regions.

Fig. 7
figure 7

Flowchart of the major stages of the experiments carried out

5.2 Image Preprocessing

Here, completely automated segmenting method is used to evaluate regional segmentation from whole set of sMRI and obtain the beneficial ROI data. The skull stripping is carried out as image preprocessing operations on the raw sMRI dataset, and its output is depicted in Fig. 7. The preprocessing stages comprise skull striping using automated algorithm.

Fig. 8
figure 8

Skull stripping. Left: with skull; right: without skull

5.3 Feature Extraction Using SegNet

The white matter, grey matter and the pial regions are extracted from brain. Cortical thickness at every cortex crest is described by the average shortest gap lengths between white matter and pial regions. Similarly, gyrification index, which is measure of volume of cortex concealed under the folds of sulci against the volume of viewable cortex in circular ROI, is employed. Schaer [47] suggested the procedure that was applied to measure the gyrification index on the entire brain cortex. Hippocampus is the increasingly being applied sMRI biological marker for timely identification of AD [25]. The right hippocampus and left hippocampus are together parceled. The technique in applying segmentation of them is based on a specific parts on the basis of a foreknowledge concerning spatially related, which is obtained using algorithm. It applied the variations in pixel intensity to find and segment cortical regions. Complete extracted derived features are described with feature measure as depicted in Table 2.

Fig. 9
figure 9

Feature outputs from SegNet

Figure 9a shows feature outputs from SegNet using ADNI CN subjects MRI: (a) segmenting grey matter; (b) segmenting white matter; (c) contouring gyri and sulci; (d) contouring cortical surface; (e) segmenting hippo campus; (f) contouring cortical thickness; (g) segmenting CSF.

Figure 9b shows feature outputs from SegNet using ADNI MCI subjects MRI: (a) segmenting grey matter; (b) segmenting white matter; (c) contouring gyri and sulci; (d) contouring cortical surface; (e) segmenting hippo campus; (f) contouring cortical thickness; (g) segmenting CSF.

Figure 9c shows feature outputs from SegNet using ADNI AD subjects MRI: (a) segmenting grey matter; (b) segmenting white matter; (c) contouring gyri and sulci; (d) contouring cortical surface; (e) segmenting hippo campus; (f) contouring cortical thickness; (g) segmenting CSF.

5.4 Classification Using ResNet-101

For SegNet-based extraction of the features and to train ResNet-101 with those features, ADNI dataset is very comfortable because of existing label for every image. Thus, ADNI dataset images are employed. For the purpose of assessing the performance of AD classifier, same ADNI dataset is opted for the research.

Regular ADNI dataset comprises three labels of sMRI images: CN, MCI and AD with the split shown in Table 1; the segmentation features by images are extracted from ADNI dataset by annotating with the labels of regions like hippocampus, cortex, grey matter, white matter and CSF as detection features described in Table 2. Firstly, for efficient segmentation of features, sMRI images are resized to 150 \(\times \)150 pixels. Next, for good performance during feature extraction, skull stripping as preprocessing method is adopted, described in Fig. 8, and these images are fed to SegNet for segmenting images for efficient feature detection of brain regions. Figure 9 depicts the intermediate outputs by this experiment. To implement practically the proposed method, CPU is used to meet the intended results.

Table 4 Confusion matrix parameters of classifier

Table 4 depicts confusion matrix statistics of the classifier of ResNet-101, and Fig. 10 depicts the confusion matrix of the classifier while performing classification for CN, MCI and AD. The experiment of testing is carried out using 240 images from ADNI Datasets. CN and two AD severity images are considered as per Table 1.

Fig. 10
figure 10

Confusion matrix of the classifier when classifying the CN, MCI and AD

Table 5 shows comparison of earlier and proposed methods in segmentation of brain MRI and classification of AD using deep learning models

Table 5 Comparison of earlier and proposed methods in segmentation of brain MRI and classification of AD using deep learning models

The comparisons between earlier and proposed segmentation of brain MRI and classification of AD are described in Table 5. It is noted that the segmenting brain images using deep learning model which being input to the classifier increases the performances of the classification of AD.

6 Conclusion

The current investigation is performed with a conception that productive image segmentation method, specifically for brain MRI, by any deep learning model presumably enhances the accuracy during classification. The suggested SegNet-based deep learning method for segmenting has shown a positive indication while extracting the AD pertinent brain morphologically local features required to classify AD condition, accurately classifying more severity conditions against CN and progressive MCI. Many researchers have applied VBM technique which needs more time and resources.

The current approach though it needs preliminary elaborate investigation supplies very good result and too powerfully demonstrates that deep learning model-based feature segmentation is very supportive to increase the classifier performance. The classification here is a three-class challenge for overall detection of AD condition using a ResNet model. The striking benefit of using ADNI MRI image dataset for extracting feature and training the classifier is that it permits the classifier to straightaway learn brain morphological local features than examining features in itself from images. For testing the classifying performance, the same ADNI dataset is used because its AD condition was already available. In practice, the images applied directly to deep neural network for assessing AD condition cannot accurately predict and grade those images. So the applied input images are essentially to be adequately segmented to the image bearing localized features. The deep neural network as such has no learning problem to detect diseased images. This is feasible only when huge volume of healthy and diseased images are present in dataset. So the classification with medium amount of dataset is better supplying ROI (grey matter, white matter, contouring gyri and sulci, contouring cortical surface, hippocampus, contouring cortical thickness, CSF) segmented images to the classifier rather than supplying raw images.