1 Introduction

Dementia is a disease that causes loss of talent during the action of activities carried out in daily life and causes the cognitive functions of the brain to gradually disappear. The most common type of dementia in the world is Alzheimer's disease (AD) with a rate of about 80%. It is estimated that approximately 135 million AD cases will occur in the world by 2050 [1]. AD occurs as a result of damage or destruction of neurons that provides cognitive functions within the brain. This situation causes physical changes in the brain over time. As a result, the physical actions of the person decrease over time, and bed care is carried out in the last phase. Ultimately, AD is a deadly disease [2].

Symptoms of AD are the decrease and disappearance of thinking, memory, and behavioral responses seen in the person over the years. Pharmacological and non-pharmacological methods are used in the treatment process of AD cases. However, not all of these methods completely cure AD cases. Only a certain time delays the progression of the disease [2, 3]. As a result, the early diagnosis of AD is important. As a requirement of this condition, it is important to detect the disease stage of AD cases using brain images. Many studies have been conducted recently using deep learning models and methods related to AD. Basaia et al. [4] used healthy and AD brain MR data. They used the transfer learning approach to training the convolutional neural network (CNN). They performed a preprocess step by using the data augmentation techniques to ensure enough data for the proposed CNN model. The binary classification success was 99.2%. M. Liu et al. [5] performed segmentation and classification processes together using AD data in their study. The dataset was organized with three classes that were mild cognitive impairment (MCI), AD, and normal contrast (NC). They used the DenseNet deep learning model in the training of the model and achieved 88.9% success by using the Softmax method in the layer where the classification process of the model took place. Lu et al. [6] performed a binary classification using AD and healthy brain MR images in their study. In this study, researchers used Visual Geometry Group-16 (VGG-16) and MobileNet deep learning models. The classification success with the VGG-16 model was 93%, whereas the classification success with the MobileNet model was 98%. T. Shen et al. [7] performed a classification task using stable MCI and progressive MCI data from AD cases. The dataset includes PET images. They used the Region-of-Interest (ROI) method for each image in the dataset. They used Deep Belief Networks (DBN). The classification success with the SVM method was reported as 86.6%. Z. Xiao et al. [8] performed the classification of AD using 2D and 3D brain images. In their three-class dataset, they extracted features using the Gray Level Co-occurrence Matrix (GLCM) and Gabor filter techniques. Besides, features were selected with the Voxel-based Morphometry (VBM) method. They achieved 92.86% classification success using the extracted features with the SVM classifier.

As inferred from related literature, deep learning models that analyze computer images have been developed in this area, recently. Since deep learning models perform the learning process automatically instead of using handcrafted feature engines, they are more efficient compared to traditional machine learning models [9, 10]. This situation explains why a deep learning model has been preferred in this study. The primary goal of this study is to enhance the Magnetic Resonance (MR) images using DeepDream, Fuzzy Color Image Enhancement (FCIE), and Hypercolumn techniques. Then, we validate whether the re-enhancing MR images contribute to the classification achievement by selecting the most efficient features from these three image sets. In this study, deep features, which are extracted from the pretrained VGG-16 model fed with re-enhancing MR image sets, are obtained and these features are concatenated. At the last step, the most efficient 1000 features are selected with the Linear Regression (LR), and then the classification task is carried out using the SVM classifier. This approach shows us that the proposed model has produced satisfactory results in detecting AD stages. In the proposed approach, the aim is to reveal the effects of the mentioned enhancing techniques by comparing the classification achievements of the proposed model considering the original and enhanced datasets, separately. VGG-16 model is preferred in this study since it is used in the Hypercolumn technique and the number of computational parameters of the model is not too large. Also, our hardware resources are sufficient to train and test the VGG-16 at a reasonable time [11, 12]. In other words, the computational environment we have could not be functional in using a more complex deep learning model that requires a large number of computational parameters. The proposed approach has gained a different perspective compared to the studies reviewed in the literature. The contributions of this study to the relevant field can be summarized as enabling the data enhancing of brain MR images with the mentioned techniques and selecting efficient features with the proposed approach.

The rest of this study is organized as follows: the dataset, deep learning models, methods, and techniques are given in Sect. 2. Also, the proposed approach in this study is described in Sect. 2. The results are reported in Sect. 3. Sections 4 consists of discussion and conclusion remarks.

2 Materials and methods

2.1 Dataset

The AD dataset was created by researcher Sarvesh Dubey [13]. The MR images with varying resolutions in the dataset were collected from different websites. The hybrid dataset used in this study was shared on the web. The dataset consists of a total of 6400 images in the JPG file format. The dataset is divided into two parts in its original form. The training data consist of 5121 images, whereas the test data consist of 1279 images. In this case, the dataset was separated into two parts as 75% of the train set and 25% of the test set. The dataset is divided into classes containing four phases of AD. These phases are Mild Demented (MID), Moderate Demented (MOD), Non-Demented (NOD), and Very Mild Demented (VMD) [13]. In addition, no detailed information was given about the patients from whom the dataset was obtained. For this reason, it is not known that the images in the dataset are homogeneous or heterogeneous. Sample images representing the classes of the dataset are shown in Fig. 1.

Fig. 1
figure 1

Example sub-dataset for four classes of AD images; a MID, b MOD, c NOD, and d VMD

2.2 Deep learning model: VGG-16

VGG-16 architecture is a convolutional model that made its name known in the ImageNet competition in 2014. This model includes convolutional layers, pooling layers, and fully connected layers in its structure. It moves the activation features by circulating 3 × 3, 5 × 5 filters on the convolutional layer input data [14]. The pooling layer is used to reduce the size of the input features, thus contributing to the performance of the model. This layer also avoids overfitting. Fully connected layers provide the preparation of probable values of the features before the classification process [15,16,17]. In fully connected layers, the FC-6 layer and FC-7 layer contain 4096 features. The FC-8 layer contains 1000 features. The input size of the VGG-16 model is 224 × 224 pixels [16]. The general structure of the VGG-16 model is shown in Fig. 2.

Fig. 2
figure 2

VGG-16 architecture and general structure

The pretrained VGG-16 model was used in the experimental steps of this study. After the FC layers of the VGG-16 model, the SVM method was used as a classifier. VGG-16 was used with its default parameters. Also, important parameters and preferred values of the VGG-16 model [18] are given in Table 1.

Table 1 Preferred important parameter values of the VGG-16 model

To ensure a reasonable training time for the CNN model as well as achieving high generalization results, we preferred the VGG-16 model [19]. As the number of compute nodes increases, the costs are also increased in terms of hardware resources as well as time for the model training [20]. The related papers have already taken attention to these issues [21,22,23,24,25]. So, in this context, we chose the VGG-16 model by taking into account the related literature and the hardware resources that we have.

2.3 Machine learning: SVM method

SVM is a machine learning method that is commonly used in regression and classification processes and finds a decision boundary between the features to be classified. This method creates a decision boundary by using features placed in the hyper-plane. The decision boundary divides the features into two parts. Then, SVM calculates the probabilities of features based on this limit. The traded feature is assigned to that class; whichever probability value is higher [26]. The operation of the SVM method is shown in Fig. 3, and the mathematical formulas used in the SVM method are specified between Eqs. (1) and (3). Here, \(X\) and \(Y\) represent the coordinate points of the features in the hyper plane. \(W\) parameter represents margin width and \(b\) parameter represents bias value. The multiple classification process of the SVM method is shown in Fig. 3b. Here, the basic logic solves a probability class \(k\) by using the formula \(k(k-1) / 2\). For example, let's assume that \(k\) is 3. Therefore, according to the equation, probability condition \(3(3-1) / 2 = 3\) is obtained. If we define these possibilities according to Fig. 3b respectively; red-blue, blue-yellow, red-yellow. Then, three probability values are subtracted for each feature. Then, each feature is transferred to the class with the highest probability value [27].

$$ u = \vec{w} \cdot \vec{x} - b $$
(1)
$$ \, \frac{1}{2}\left\| {\vec{w}} \right\|^{2} $$
(2)
$$ y_{i} (\vec{w} \cdot \vec{x}_{i} - b) \ge 1,\forall i $$
(3)
Fig. 3
figure 3

Classification representation of the SVM method. a Binary classification, b multiple classification [28]

One of the reasons why the SVM method was preferred for this study is its strong potential to provide solutions to data analysis problems encountered in daily life. The other reason is that it can efficiently execute multiple classification processes in pattern recognition and classification [29, 30]. Also, the performances of other machine learning methods (discriminant analysis, nearest neighbor, random forest, etc.) were analyzed. However, SVM was chosen as the classification method in this study because it gives us the best performance. Other parameter values preferred in the SVM method are as follows: it was the kernel scale from which the parameter was automatically selected. The box constraint level parameter value was selected and a one-to-one multi-class method parameter was selected.

2.4 DeepDream

The DeepDream is a simulation technique that is based on the imaginary dimensions of the human brain, using the features of the input images. The DeepDream is developed by Google, and the open-source code of the DeepDream is supported using the Tensorflow library. The DeepDream aims to find patterns in images through algorithmic pareidolia and to develop the patterns found using a CNN. That is, it makes the patterns (feature vector carrying the label of an object) seen in a particular image appear on the data by processing them with pretrained data. For example, after a set of images with cats is trained by the network, cat faces will become a DeepDream-defined pattern. When the network is fed with a sky image, the DeepDream algorithm creates images of cats at objects in the sky (cloud, etc.). The overall process of the DeepDream neural network model is illustrated in Fig. 4. As a result, the DeepDream algorithm is useful for rendering visual content in surreal and abstract styles in over-processed images [31, 32].

Fig. 4
figure 4

The overall process of the DeepDream neural network model

The working logic of the DeepDream algorithm is as follows; when an image is an input to a trained neural network model, neurons fire and activations occur. The DeepDream algorithm selects some of these neurons, allowing them to fire more than others. So, it increases their activation. The activation boost is accomplished by gradient ascent. This process is repeated until it contains all the features that the particular layer was originally looking for [31].

In this study, AD data were produced using the DeepDream technique. Data enhancing examples are shown in Fig. 5.

Fig. 5
figure 5

Sub-data samples of the original dataset obtained by the DeepDream technique

2.5 Fuzzy color image enhancement (FCIE) algorithm

FCIE algorithm plays an important role in image analysis. This algorithm consists of three stages function. These stages are as follows; image coding, fuzzy enhancement, and image decoding as shown in Fig. 6.

Fig. 6
figure 6

The stages of FCIE algorithm

Thanks to these functions, the gray level intensities in each image are matched to a fuzzy plane. In the first stage, each image in the dataset is converted from gray level area to fuzzy level area. At this stage, membership value is given for each pixel. That is, each gray level pixel is assigned a membership degree depending on its position in the histogram. In the second stage, the fuzzy level parameters used to change the image are updated and the aim is to enhance the fuzzy in the image. In the last stage, fuzzy enhanced images are decoded to re-convert them into gray-level images. Each pixel in the fuzzy level is converted into gray-level pixels according to the membership degree. Generally, dark pixels have a low membership degree; light pixels have a higher membership degree [33,34,35]. Calculation of the membership value the formulas between Eqs. (4) and (6) are used. In these equations; \({F}_{d}\) and \({F}_{e}\) are conversion coefficients, \({f}_{\mathrm{max}}\) denotes the maximal gray value, \({f}_{ij}\) denotes the gray level of the \(\left(i, j\right)\) th pixel, \({T}^{\left(r\right)}\) is defined as successive applications of \(T\) [35].

$${\mu }_{ij}={\left[1+\frac{{f}_{\mathrm{max}}-{f}_{ij}}{{F}_{d}}\right]}^{-{F}_{e}}$$
(4)
$${\mu }_{ij}^{^{\prime}}={T}^{\left(r\right)}\left({\mu }_{ij}\right) ; r=\mathrm{1,2},3\dots ,$$
(5)
$$ T\left( {\mu_{ij} } \right) = \left\{ {\begin{array}{*{20}l} {2\left( {\mu_{J} } \right)^{2} ;} \hfill & {0 \le \mu_{J} \le 0.5} \hfill \\ {1 - 2\left( {1 - \mu_{J} } \right)^{2} ;} \hfill & {0.5 \le \mu_{J} \le 1} \hfill \\ \end{array} } \right. $$
(6)

Thanks to the FCIE algorithm assigning membership degree based on the histogram distribution of the images, the contrast enhancement speed and quality can be enhanced in each image data [35]. In this study, we enhanced the original dataset using the Python codes with the FCIE algorithm [36]. Data enhancing image examples are shown in Fig. 7.

Fig. 7
figure 7

Sub-data samples of the original dataset obtained by the FCIE algorithm

2.6 Hypercolumn technique

Hypercolumn is a technique that performs the classification of pixels using hypercolumn. That is, each image given as an input to the model has a hypercolumn vector. These hyper vectors hold all the activation features of that pixel in the convolutional model. Thus, instead of deciding according to the pixel value in the final layer of the convolutional model in the classification process, it chooses the most efficient one by examining all the features in the hypercolumn vector. Thus, with this technique, the spatial location information of the most efficient feature is brought from the previous layers and contributing to the classification process [37, 38].

The essence of the Hypercolumn technique is based on heat maps. After the convolution layers of the model, this technique uses bilinear interpolation and creates a transition feature value using two feature values with Bilinear interpolation. In other words, bilinear interpolation creates a smooth transition value between two feature values. In this way, feature maps extracted from other layers of the model are added, and it is processed with the sigmoid function. Heat maps extracted from the model are then combined to produce possible output values. This joining is done by the "Concatenate" function in the Hypercolumn technique. The purpose of our use of the Hypercolumn technique is to obtain a new dataset by enhancing the original dataset. Hypercolumn is a technique that functions with a deep learning model. This technique was compiled in the Python language along with the VGG-16 model [39]. A new set of images was created by selecting efficient features obtained from heat maps combined in the last layer of the VGG-16 model [38, 40]. Some images obtained from the data enhancing with the Hypercolumn technique are shown in Fig. 8.

Fig. 8
figure 8

Sub-image samples obtained by the work of the Hypercolumn technique in the VGG-16 model

2.7 Network training and feature selection methods

The Stochastic Gradient Descent (SGD) optimization is a method that facilitates network training by convolutional networks, contributes to the success of the model, and reduces the error value of the parameters used in the model to a minimum level. This optimization method is generally used by deep learning models. The SGD method updates the weight parameters of the network during training by selecting some data randomly, instead of using all of the input data of the convolutional networks [41, 42]. Also, instead of calculating the total cost of the model, it calculates the cost at the end of each iteration. Thus, it enables evolutionary networks to be trained more easily. The formula that explains this situation is given in Eq. (7). In this equation; \(\Theta \) represents the weight parameter, \(t\) is the time state, \(\alpha \) is the learning ratio. Parameters \(x\) and \(y\) show the coordinate values of the feature extracted from the input data [43].

$${\Theta }_{t }={\Theta }_{t-1 }{\alpha}{\nabla }_{\Theta }J\left(\Theta ;{x}^{i },{y}^{i}\right)$$
(7)

Feature selection is a method that prevents inefficient features from being processed by the model in deep learning models and contributes to the success of the model. LR is an important statistical method for analyzing medical data. This method allows the relationships between multiple factors to be defined and characterized [44]. A confidence interval (\(p\)) is created for the standard deviation error that may occur in the LR feature selection method. It then tests whether the estimated coefficients for the LR method differ significantly from zero. If this value is above the specified confidence interval, it is considered as the real value by the LR. Parameter (\(\sigma \)) given in Eq. (8) estimates the standard deviation value in the LR method. If the standard deviation is small compared to the coefficient, the \(p\) interval narrows and the probability of detecting the actual value increases [45].

$$\widehat{\beta }\sim N(\beta , {\sigma }^{2})$$
(8)

In this study, the LR method was used for feature selection and it was compiled in Python [46]. The deep feature set extracted from the fully connected layer (FC7—4096 features) of the VGG-16 was used. This feature set contains feature columns that represent each entry. The columns containing the best 1000 features were selected with the LR method and classified using the SVM method.

2.8 Proposed approach

The proposed approach is designed for use in detecting all four phases of Alzheimer's disease. The main idea of the proposed approach is based on the data enhancing by processing the original dataset with different methods and techniques and its processing in deep networks. In this study, DeepDream, FCIE, and Hypercolumn techniques were used for data enhancing. Here, methods and techniques were used on the convolution model to measure success.

The proposed approach uses the three new datasets from the data enhancing phase in the VGG-16 model. The results (3 \(x\) 4096 features) extracted from the FC-7 layer of the VGG-16 model are combined forming a dataset with 12,288 features. This dataset is processed with the LR feature selection method to select the top 1000 features. In the last step of the proposed approach, 1000 features are classified with the SVM method as shown in Fig. 9.

Fig. 9
figure 9

The general design of the proposed approach

3 Results

The necessary coding and analysis were performed using Python 3.6 with the Jupyter Notebook. Training of the VGG-16 model was carried out using MATLAB (2019b). Due to the inadequacy of the hardware unit, Google Colab was used in the compilation of Python codes. The hardware features offered by Google Colab were the Tesla K80 GPU graphics card, 12 GB DDR5 memory, and 2.30 GHz @ Intel© i5 Xeon (R) processor. In the compilation of MATLAB, the hardware with the following features was used, Windows 10 64-bit OS, 1 GB graphics card, 4 GB memory, and Intel© i5—Core 2.5 GHz processor.

The confusion matrix parameters were used to measure the analyzes performed in the experiment. Here, the formulas between Eqs. (9) and (13) were used to calculate the values of metrics. The metrics used in the experiment are Sensitivity (Se), Specificity (Sp), F-score (F-scr), Precision (Pre), and Accuracy (Acc). Besides, the parameters used in the confusion matrix are True-Positive (TP), True-Negative (TN), False-Positive (FP), and False-Negative (FN) [47, 48].

$$\mathrm{Se}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $$
(9)
$$\mathrm{Sp}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}$$
(10)
$$\mathrm{Pre}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(11)
$$\mathrm{F}-\mathrm{scr}=\frac{2x\mathrm{TP}}{2x\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}$$
(12)
$$\mathrm{Acc}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$$
(13)

In the first four steps of this study, 25% of the dataset was used as test data and 75% as training data. In the last step, the cross-validation method was used for the dataset and a cross-validation coefficient (\(k = 10\)) of ten was chosen. The epoch value was 75 for training at each step and the total number of iterations is 5625. In the first step, the original dataset was trained with the VGG-16 model and 4096 features of this model extracted from the FC-7 layer were classified by the SVM method. As a result of the classification, 99.10% accuracy in MID, 99.87% in MOD, 96.98% in NOD, and 96.61% in VMD were obtained. The overall accuracy was 96.31%. The training and validation accuracy and loss graphs of the first step are shown in Fig. 10, and the confusion matrix is shown in Fig. 11. Besides, the analysis results of the first step are given in Table 2.

Fig. 10
figure 10

Accuracy and loss graphs of the VGG-16 model using the original dataset

Fig. 11
figure 11

Confusion matrix of the VGG-16 model using the original dataset

Table 2 Confusion matrix metric values of the VGG-16 model using the original dataset

In the second step of the experiment, the original dataset was recreated with DeepDream, FCIE, and Hypercolumn methods. Three datasets were individually trained with the VGG-16 model. The deep features provided by the VGG-16 model were classified using the SVM method. The overall accuracy success in the dataset created with DeepDream was 82.0%. Overall accuracy success from data enhancing with the FCIE technique was 98.94%. And the accuracy success rate of the data enhancing with the Hypercolumn technique was 95.38%. In this step, it was observed that the success obtained from datasets processed with FCIE and Hypercolumn technique achieved a result close to the success achieved in the original dataset. Training accuracy graphs of data enhancing are shown in Fig. 12 and confusion matrices are shown in Fig. 13. The confusion metric values for this step are given in Table 3.

Fig. 12
figure 12

Training accuracy graphs obtained by using data enhancing techniques and approach with the VGG-16 model; a DeepDream, b FCIE, c Hypercolumn

Fig. 13
figure 13

Confusion matrices obtained by using data enhancing techniques with the VGG-16 model; a DeepDream, b FCIE, c Hypercolumn

Table 3 Confusion matrix metric values of the VGG-16 model using the dataset enhancing

In the third step of the experiment, datasets were trained with the VGG-16 model as in the second step of the experiment. 4096 deep features extracted from the FC-7 layer of the VGG-16 was considered. Three feature sets obtained using DeepDream, FCIE, and Hypercolumn techniques were combined to create a new feature set with a total of 12,288 features. The combined feature set was then classified using the SVM. Here, as in the previous steps, the test data was set at 25%. The overall accuracy rate obtained as a result of the classification was 99.69%. The success achieved in the third step gave a more successful result than the first two steps. However, since the feature number of the dataset (12,288) contains too many features, it was necessary to re-classify by selecting efficient features.

In the fourth step of the experiment, the LR method was used as a feature selection method. Here, the feature dataset file was compiled with LR codes in Python and 12,288 features were sorted according to the LR method. The first 1000 features were selected from the ranked list and reclassified using the SVM method. As a result of the classification, an overall accuracy of 99.94% was achieved. The success achieved in the fourth step was better than the success achieved in the first three steps. Also, efficient features were selected in the fourth step, contributing to the proposed approach. With the proposed approach, a 100% success was achieved from the classification of Mild Demented class; 99.94% success was achieved from the classification of Moderate Demented class; The accuracy rate in the classification of Non-Demented class was 100%, and the accuracy rate in the classification of Very Mild Demented class was 99.94%. The confusion matrices of the third and fourth steps are shown in Fig. 14. Furthermore, the Receiver Operating Characteristic (ROC) curve of the fourth step is shown in Fig. 14c. The metric results of the third and fourth steps in the confusion matrix are given in Table 4.

Fig. 14
figure 14

Classification by SVM method; a confusion matrix of features obtained by combining datasets, b confusion matrix of combining datasets, after LR feature selection, and c ROC curve

Table 4 Analysis results of the proposed approach

In the last step of this study, cross-validation method was used to check the validity and reliability of the proposed approach on the dataset, and to avoid problems such as overfitting or underfitting [49, 50]. The cross-validation coefficient value ten was chosen (k = 10). In the fifth step, the combined feature set (12,288 features) was processed with the cross-validation method and classified using the SVM method. The overall accuracy success achieved in the classification process was 99.66%. The time elapsed in the classification process was 2292 s. Then, the 1000-feature set selected according to the LR feature selection method was processed with the cross-validation method and classified with the SVM method. The overall accuracy success achieved in the classification process was 99.69%. The time elapsed in the classification process was 39 s. The confusion matrices of the fifth step is shown in Fig. 15. Furthermore, the ROC curve of the fifth step is shown in Fig. 15c. The analysis results obtained by the cross-validation method are given in Table 5. Accuracy success and time savings obtained in the fifth step showed the success of the proposed approach. Stable analysis of the proposed approach has been confirmed by the cross-validation method.

Fig. 15
figure 15

Classification by SVM method using cross-validation method; a confusion matrix of features obtained by combining datasets, b confusion matrix of combining datasets, after LR feature selection, and c ROC curve

Table 5 Analysis results obtained using the cross-validation method of the proposed approach (k = 10)

4 Discussion and conclusion

The symptoms of AD worsen over time, but the progression of the disease changes. On average, a person with Alzheimer's lives four to eight years after diagnosis [51]. In this study, dementia phases of AD patients were investigated and here, brain MR images of AD are used. Determining the phase of a person with Alzheimer's is a difficult process for experts. Here, we suggested using the deep learning model and image processing techniques together to improve the decision-making process of experts and to facilitate this difficult process. We aimed to increase the performance of the VGG-16 deep learning model we selected in the study. Here, instead of the VGG-16 model, it could be chosen in different convolutional models. The success rate of the VGG-16 model in its original dataset was 96.31%. We increased this success to 98.94% in the datasets enhancing with the FCIE technique. And the FCIE technique has been shown to contribute to improving the prediction success of the proposed approach. Here, we observed that the DeepDream method did not meet the expectations. However, we still included it in the processes of the experiment. We can explain this situation as follows; we applied in the last step of the experiment, perhaps considering the possibility of giving efficient features. As a result, FCIE and Hypercolumn techniques significantly contributed to the experimental process. Besides, we did not use the features obtained from the original image data in the combined feature sets. The aim here was to successfully classify the stages of AD with obtained deep features. It has been confirmed in the experimental analysis of this study that MR images yielded successful results in the detection of AD stages with data enhancing techniques and approaches.

AD occurs as a result of the loss of cognitive functions in the brain, causes near-forgetfulness in the case and dementia in subsequent processes. The prevalence rate of AD is increasing the day by day. With the proposed approach, the AD stages were determined using the data obtained from AD patients. With this approach, the data enhancing of MR images, which are seen as preprocess steps, was realized. Here, enhancing MR images using the Hypercolumn and FCIE techniques were observed as useful. One of the basic aims was to understand whether the dataset enhancing contributes to model performance. This subject was validated considering the results. Also, another significant aim of this study is to select more efficient features by combining feature sets. To this aim, the LR feature selection method was employed in the fourth step of the experiment. This process resulted in an overall accuracy of 99.94%. The proposed approach relied upon the data enhancing MR images in this study produced promising results in determining AD stages. However, not providing any statistical info about the patient information of the images in the dataset raises doubts about the accuracy of the approach we recommend. No statement about this situation was made by the researcher who provided the data. Therefore the thought we hesitated, there may be more than one image obtained from the same Alzheimer patient. This situation can increase the training and testing success of the proposed approach and, overfitting learning may have occurred. Although we have used the cross-validation method we performed at the last step of the experiment to minimize overfitting learning, this situation cannot completely eliminate the hesitations in the dataset. In addition, the images in the dataset are not of high quality, which may negatively affect the success of the proposed approach.

In the future study, we hope to improve the performance of the proposed approach on different datasets by using attention modules (regional focus) on images.