Keywords

1 Introduction

Alzheimer's disease is a neurological condition that causes the death of brain cells and the brain gets shrink. It is progressive in nature. A continual deterioration in mental, behavioral, and social abilities that undermines a person's capacity to operate independently- is the most common cause of dementia [1]. Mostly people who are above 65 are more prone to have this disease (as shown in Fig. 1) and very less number of people in the age between 30–65 have this disease which is known as early-onset. In the final stage of disease, individuals lose their ability to react to their surroundings, converse, and eventually regulate their mobility. They may still pronounce words and expressions, but it becomes more difficult to communicate. Research on Alzheimer’s disease has been intensified due to the increasing number of patients per year and to remove manual errors that are caused by the human eye.

Fig. 1
An illustration represents a set of M R I scans going through the C N N, followed by the classification.

CNN Representation

In past decade, we have seen the development of Deep Learning techniques in medical related fields. When we compare Deep Learning and convolutional neural networks (CNNs) to traditional ML algorithms, they have dramatically improved performance in this field.

Basically, CNN is a Deep Learning algorithm that takes in image as input (as shown in Fig. 1) and learn from it by assigning learnable biases and weights to various objects of the image. It has one input layer at the starting which consists of many neurons where each neuron represents the pixel value of the input image and the last layer of the network contains the labels which are present in the form of One-hot Encoding. The layers in between the input layer and output layer are called Hidden Layers. Each neuron in the hidden layer takes input from the patch. Then it computes weighted sums and applies biases. The above definition can be represented by this equation,

$$ {\text{Weighted sum output for a patch }} = {\text{ act}}(\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} W_{ij} x_{{{\text{i}} + {\text{p}},{\text{j}} + {\text{q}}}} + {\text{b}}) $$

where n is size of the filter act is an activation function Wij is a weight matrix, b is bias p and q represents the (p,q) neuron in the hidden layer. There are three main components that are core to a CNN. First part is convolutional operation, it allows us to generate feature maps or detect features in our image. The second part is non-linearity, it helps us to deal with the features which are highly non-linear. And lastly, we have pooling or down sampling which allows us to scale down the size of each feature map.

CNNs have covered some extraordinary ground in image classification which plays an important role in the area of medical field. So, we have proposed various Transfer Learning techniques and CNNs for Detection of stages in Alzheimer’s disease. Basically, it will mark the size of the affected part in the brain and according to that the model will draw a conclusion and place that image in 1 out of 4 stages.

2 Related Work

From the past two decade, there have been numerous published methods in the area of machine learning, especially in DL since 2013. DL models have grown prominently in medical image processing since 2013, when the research on new architectures in neural network gained momentum [1]. Deep learning technologies have shown groundbreaking performance in various fields in recent years (like visual object recognition, human action recognition, object tracking, image restoration, de-noising, segmentation tasks, audio classification, brain-computer interaction etc.) [2]. Following the achievements of DL in models categorizing two dimensional photos, an increasing percentage of studies had attempted to integrate DL with medical images [3, 4]. DL architectures, such as convolutional neural network, may interpret hidden information in neuroimaging data, find linkages between different sections of an image, produce total cognition, and efficiently capture disease-related pathologies [1]. Deep learning methods have been effectively applied to sMRI, fMRI, PET, and DTI. The most prevalent method of identifying Alzheimer's disease, based on our research, is MRI, which motivated us for using MRI scans in this paper. Despite the fact that many researchers have used deep neural network to train their model from the start, deep models are frequently impossible to apply because of prolong convergence and huge dataset is necessary [5]. Commonly, we apply pre-trained CNNs on a single problem specific task as initialization step, and then we train the model again for new job by fine-tuning their last layers [6, 7]. The concept of “transfer learning” has proven to be an effective way to train a huge architecture without overfitting it. It has shown to be faster and more effective than traditional training, especially for cross-domain tasks [8,9,10]. In the ImageNet Database, Murala et al. [11] classified 1000 image classes. Different researchers later suggested numerous variations and advancements of CNN for Object identification and Image categorization for example VGG [12,13,14,15,16], ResNet [17,18,19], GoogLenet [23,24,25], R-CNN [26,27,28], InceptionV3 [20,21,22] etc. Although with quicker training, its likely that spatial correlations between slices are prone to loss. Transfer learning models can match the accuracy of a three-dimensional CNN model trained from scratch.

3 Material Methods

3.1 Cnn

As we all know, CNN is really effective when it comes to image classification because multiple convolutional filters are operating in all layers of a CNN, scanning the entire feature matrix and performing dimensionality reduction. Figure 2 shows the architecture of our proposed CNN model. In this model we have used a 3-layer Convolutional network with max pooling layer stacked between two convolutional layers. After that we have flattened the input and passed the resultant to a fully connected layer. I have also used Dropout layer and Batch normalization for Regularization. And finally, at the end I have used a dense layer with Softmax activation function as my output layer where,

$$ {\text{Softmax}}\left( {{\text{yi}}} \right) = \frac{{e^{{y_{i} }} }}{{\mathop \sum \nolimits_{j} e^{{y_{j} }} }} $$
Fig. 2
An illustration represents the architecture of the C N N model. It exhibits the sequence of flow through a set of blocks denoting the input and output values in the convolution 2 d, batch normalization, max pooling, flattened, dense, and drop out.

Architecture of our CNN Model

After that, we compiled the program using cross entropy loss function and Adam optimizer.

3.2 Resnet50

A residual neural network50, also known as Resnet50. It is very efficient as it avoids the vanishing gradient problem by developing deeper networks than other plain networks while also determining the optimal number of layers.

We have used a pre-trained ResNet50 model because the weights and biases are already trained in millions of images. Then we iterate through all the layers and then each layer is set to non-trainable. By doing this we freeze the trainable parameters (weights and biases) in all the layers of the model so that they are not retrained whenever we go through the training process. We have also added two fully connected layers in the last for classification using Softmax function. At last, we have compiled the model using cross entropy loss function and Adam optimizer.

3.3 InceptionV3

InceptionV3 is a convolutional neural network architecture which is vividly used for image classification. It has high efficiency, computationally less expensive and uses auxiliary classifiers as regularizes. The basic idea behind inception was rather than choosing the kernel size and no. of filters and pooling etc., we basically use all of them together and stack the outputs of all of these together and the algorithm will learn which combinations to choose. The problem with such a model is that there are too many computations need to be done even in just one block. So to solve this issue, we make use of 1 × 1 convolutional. 1 × 1 convolutions reduce the number of computations significantly.

We have used a pre-trained InceptionV3 model because the weights and biases are already trained in millions of images. Then we iterate through all the layers and then each layer is set to non-trainable. By doing this we freeze the trainable parameters (weights and biases) in all the layers of the model so that they are not retrained whenever we go through the training process. We have also added two fully connected layers in the last for classification using Softmax function. At last, we have compiled the model using cross entropy loss function and Adam optimizer.

3.4 Vgg16

Visual Geometry Group16, also known as VGG16 is a cutting-edge transfer learning model. It has sixteen layers. It was constructed as a deep convolutional neural network model, outperforms baselines on a wide range of datasets outside of ImageNet VGG16 is profoundly used image recognition model nowadays. Figure 3 represents the architecture of our proposed VGG16 model. We have used a pre-trained VGG16 model as saves the time required for computation of the weights and biases. We iterate through all the layers and then each layer is set to non-trainable. By doing this we freeze the trainable parameters (weights and biases) in all the layers of the model so that they are not retrained whenever we go through the training process. We have also added another dense layer for classification using Softmax function as shown in Fig. 3. At last, we have compiled the model using cross entropy loss function and Adam optimizer.

Fig. 3
An illustration represents the architecture of the V G G 16 model. It exhibits the sequence of flow through a set of blocks denoting the input and output values in the input layer, convolution 2 D, max pooling 2 D, convolutions 1, 2, and 3, flattened, and dense.

Architecture of proposed VGG16 model

3.5 Vgg19

Visual Geometry Group19, also known as VGG19 is the advanced version of VGG16. The motivation behind using this model is because of its high accuracy and faster training speed. VGG19, which was constructed as a deep convolutional neural network model, outperforms baselines on a wide range of datasets outside of ImageNet. VGG19 is one of the most widely used image-recognition models nowadays.

Figure 4 represents the architecture of our proposed VGG19 model. We have used a pre-trained VGG19 model as saves the time required for computation of the weights and biases. Then we iterate through all the layers and then each layer is set to non-trainable. By doing this we freeze the trainable parameters (weights and biases) in all the layers of the model so that they are not retrained whenever we go through the training process. We have also added two dense layer for classification using Softmax function as shown in the Fig. 3. At last, we have compiled the model using cross entropy loss function and Adam optimizer.

Fig. 4
An illustration represents the architecture of the V G G 19 model. It exhibits the sequence of flow through a set of blocks denoting the input and output values in the input layer, convolution 2 D, max pooling 2 D, convolutions 1, 2, and 3, flattened, dense, and drop out.

Architecture of our proposed VGG19 model

4 Experiments and Results

4.1 Dataset

In both the training and test sets, the MRI dataset has a collection of 4 different types of scans. A total of 6400 images classified into 4 groups Mild Demented, Very Mild Demented, Non-Demented, Moderate Demented.

The images in the dataset were pre-processed to be 224 (W) * 224 (H) * 3 pixels in size (color channel).

4.2 Analysis of the Output

The training set is used to determine the accuracy and loss after each epoch, and it has 100 steps/epoch with a batch size of 64 out of 6400 images for Alzheimer's disease.

For the results of our dataset, F1-score is a performance metric that we calculated.

$$ F_{1} { }score = { }\frac{2TP}{{2TP + FP + FN}} $$

The F1 score is defined as, “harmonic mean of accuracy and sensitivity”. The most favorable value is 1, which is the aim of this research. The goal of this research is to get the most favorable value i.e., 1.

A true positive (TP) is the proportion of correct predictions by the model. For an instance, when the model predicts the class of Alzheimer’s disease correctly. Similarly, when the model correctly predicts a particular case of an Alzheimer’s disease not properly afflicted by it, then is called True negative (TN). When the model inaccurately predicts a particular class of an AD as an analogous class of AD, this is known as false positive (FP). FP is also known as TYPE-1 error. And lastly, when a model inaccurately predicts a particular case of an AD class to be analogous to another class is called false negative (FN). FN is also known TYPE-2 error.

The following are the results we got after applying different models (Fig. 5).

Fig. 5
A multi-bar chart denotes the F 1 scores of C N N, V G G 16, V G G 19 classes, RESNET50, and inception V 3 for mild demented, very mild demented, non-demented, and moderate demented. The bar of non-demented has the highest value under all the models, followed by moderate and mild demented.

F-1 score of all proposed models

4.3 Output of Our Model

We put our model for testing with an unseen dataset i.e., test dataset. Total MRI scans in the testing data were 1279. Table 1 shows the accuracy for test dataset using various models:

Table 1 Testing Accuracy of our model

5 Conclusion

In this paper, we used various Transfer Learning models to correctly predict the cases of AD. On the same dataset, different CNNs were trained. Our proposed model gave an accuracy of 68.49% on testing dataset using ResNet50 architecture. Performance can be further increase by applying Data Augmentation, Hyper parameter tuning and CNN architecture is well modified.