1 Introduction

Alzheimer’s disease (AD) is a type of degenerative neurological brain disorder. It causes progressive cognitive deterioration due to deposition of beta-amyloid and neurofibrillary tangles in the cerebral cortex and subcortical gray matter [1].

Most cases of Alzheimer’s disease are sporadic, entitled to the elderly with unclear etiology. Individuals with Alzheimer’s disease experience noticeable symptoms like memory loss, only after years of their brain already having succumbed to the damage. Neurons of the brain are damaged or destroyed as the disease progresses. Ultimately, nerve cells supporting basic bodily functions, in parts of the brain, are affected and they become bed-bound.

Alzheimer’s disease, being the leading cause of dementia includes symptoms like, loss of short-term memory and other cognitive deficits like, language and visuospatial dysfunction, poor judgment, and difficulty handling complex tasks due to impaired reasoning [2].

Apart from inflammation and atrophy, two of the major brain changes associated with Alzheimer's are: the accumulation of the beta-amyloid protein fragment outside neurons and abnormal form of the protein tau inside neurons [1].

Diagnosis is the most crucial and difficult part, demanding doctors with high expertise to determine dementia caused by Alzheimer’s disease. Some of the diagnosis approaches include obtaining a family's medical history of cognitive, psychiatric and behavioral changes from the individual, conducting problem-solving, memory and other cognitive, physical and neurologic examinations. Brain imaging to observe the brain volume is a popular diagnosis method because brain volume shrinkage is one of the vital symptoms of Alzheimer’s. Currently, cure for AD is far from possible, but its early detection can only help in ameliorating the symptoms and slow the progression of neuron damage.

2 Related Work

Chima et al. [3] suggested early diagnosis of AD using unique features like biomarkers in blood with machine learning. This approach resulted in a true positive rate > 0.79, true negative rate > 0.70 and an AUROC score > 0.80 at the initial stages of the disease.

Tarek et al. [4] used the OASIS dataset to design a convolutional neural network with six layers using Floyd hub’s GPU. An accuracy of 80.25% was obtained after 545 epochs. This paper tries to address the issues with conventional machine learning algorithms that need manual feature extraction which might not be able to discern complex patterns in image data.

Alzheimer’s disease cannot be diagnosed easily because the magnetic resonance imaging (MRI) data of people with Alzheimer’s disease and standard healthy older people have negligible difference. Jyoti Islam et al. [5] used the OASIS dataset augmented with multiplanar patches to train a densely connected deep neural network which gave a precision of 75% for preclinical stage, 99% for non-demented stage, 62% for stage I (mild) and 33% for stage II (moderate) of Alzheimer’s disease.

Some components of the brain like blood vessels and branching structures that have been affected by amyloid beta may contain pertinent information for the diagnosis of Alzheimer’s disease. Conventional methods do not utilize these features. Sahrim et al. [6] proposed a method which uses branching structures of blood vessels based on tortuosity and density for the detection of AD. Computer vision techniques are used to analyze vascular abnormalities to distinguish between the features of the tissue from people with healthy brains and those with Alzheimer’s disease. An accuracy of 100% was achieved using a combination of the description of branching structures and an accuracy of 90% was achieved by using branches and their paths for classification.

Aradhana Soni et al. [7] suggest the use of a 30 s verb fluency task as a data source for diagnosis of AD. Information is extracted from the concatenated text string of verbs recorded during the task, using natural language processing. The sequence of verbs produced is used along with this information to detect AD with a recurrent neural network (RNN). An accuracy of 76% was obtained with this model.

3 System Design

We propose a method which utilizes MRI brain scan data from OASIS dataset, to detect Alzheimer’s disease. The features that we have used to train the machine learning models are described in the next section.

The proposed system design has seven steps, each performing a particular task involved in building the required target model:

  1. 1.

    Input

  2. 2.

    Data visualization

  3. 3.

    Feature selection

  4. 4.

    Data transformation

  5. 5.

    Model training

  6. 6.

    Model evaluation and selection

  7. 7.

    Output

The workflow is presented in Fig. 1. The following subsection elaborates each of these stages in detail.

Fig. 1
A flow diagram of the proposed system. The input file, oasis underscore longitudinal dot c s v, undergoes data visualization followed by feature selection, data transformation, model training, choosing the best-fit model, and output.

Proposed system design

3.1 Input Dataset

The dataset was obtained from the Open Access Series of Imaging Studies (OASIS) project, aimed at studying 150 subjects who aged between 60 and 96. The study focused on longitudinal MRI data of right-handed mature individuals, with and without AD, and acquired three T1-weighted images per imaging session resulting in 373 imaging sessions, providing the imagery predictor variables. The dataset also provided non-imagery clinical predictors and demographic variables.

Considered features that represent socio-demographic attributes and clinical predictors of the subjects are listed in Table 1.

Table 1 Dataset description

3.2 Data Visualization

Data visualization was performed to gain statistical and graphical insights into the data. This helped us gain a better understanding of the data, determine the structural correlation between the features, unravel outliers and inconsistencies in the structure and highlight some of the key dependencies and patterns in the data distribution.

The graphs in Fig. 2 show the amount of influence some of the attributes have on the demented and non-demented subjects. The key findings from data visualization are:

Fig. 2
6 area graphs and a stacked bar graph. a, e, d, and f. Age, e T I V, R W B V, and A S F have bell trends with group 1 higher than group 2. c. M M S E has increasing trends with group 0 higher. b. E D U C has double peaks. g. Demented has more patients for 1, while 0 is higher in non-demented.

a Distribution based on age; b Distribution based on years of education; c Distribution based on mini-mental state examination; d Distribution based on normalize whole brain volume; e Distribution based on estimated total intracranial volume; f Distribution based on atlas scaling factor; 0 represents non-demented, 1 represents demented in the legend of the graphs; g Distribution based on gender

  • Men are more prone to be demented than women.

  • Higher strength of 70–80-years-old individuals in the demented class than that of the non-demented class.

  • Non-demented group has higher brain volume when compared with demented group as evident from the graph.

  • Examinations in the data presented a connection between years of education and Alzheimer’s disease, indicating that demented people were less educated (in years).

  • MSME graph has a higher concentration of non-demented people in the range of 26–30, whereas demented people are distributed throughout.

  • Demented group has higher total intracranial volume than the non-demented group.

3.3 Feature Selection

Input data obtained from the OASIS project was initially subjected to exploratory data analysis (EDA), to uncover the patterns and inconsistencies in the data distribution, which paved a way for feature extraction and selection, which is a crucial task that has a significant impact on the model’s performance. After a detailed scrutiny of the data based on individual contribution and effects of correlation between the features, highly influential attributes like age, gender, years of education, socioeconomic status, brain volume ratio and MSME score were considered for further studies.

3.4 Data Transformation

Once the decision about the features was made, the data was preprocessed which involved identifying the missing data and the two approaches followed to deal with it are:

  • Drop the rows with missing values

  • Perform imputation- Replace the missing value with a value obtained from some chosen combining function like average or mode.

  • Label Encoding - The gender column contains categorical string data which has to be numerically encoded. In this case, a simple encoding of M-1 and F-0 is done.

  • Feature Scaling - Different features have different scales and ranges of input values, which when not scaled to a standard uniform range results in erroneous models. Standardization was performed on every feature so as to fit a definite scale.

Values of eight rows under the SES column were found missing. Both row dropping and imputation with median were performed to compare the performance, out of which imputation showed better results. SES is a discrete variable and median also reduces the effect of outliers, so it was chosen for imputation.

3.5 Model Training

This section deals with one of the important stages of data segregation. The ultimate goal of the project is to develop a generalized model that covers the entire population of the subset of data, providing apt results to new, unforeseen instances. For this purpose, the clean data obtained in the previous stage is split into three sets—training, validation and test set for the purpose of cross-validation. The training set is used to develop the predictive model, the validation set is used to fine-tune the model’s parameters and the test set is used to evaluate the model’s performance. This ensures regularization of the model to avoid overfitting.

The models used for training the dataset are: logistic regression, SVM, decision tree, random forests, AdaBoost, averaging, max voting, bagging and boosting. A five-fold cross-validation was performed to figure out the best parameters for each model.

In the case of most neurodegenerative diseases being a life-threatening terminal disease, it is important for medical diagnostics to have a high rate of true positives for early identification of AD in patients. On the other hand, it is also equally important to make sure that the rate of false positives is as low as possible since we do not want to put the person through mental distress and the financial burden of bearing unnecessary medical therapy charges. Hence, the area under the receiver operating characteristic curve (AUC) was chosen as the main performance measure which provides an aggregate performance measure across all possible classification thresholds and displays the ability of a classifier to distinguish between two classes. The models were fine-tuned and evaluated based on its accuracy, recall and AUC scores.

Algorithms used to compare the performance of the model are listed in Table 2.

Table 2 Models trained

3.6 Model Evaluation and Selection

Model evaluation method is an approach to assessing the performance of each ML model. To check for the correct predictions, accuracy and metrics such as recall, F1 score, area under the ROC curve (AUC) obtained from the confusion matrix (CM) are used.

From Table 2, it is evident that random forest has the highest accuracy, recall, F1 score and AUC with 86.84%, 80%, 86.49% and 87.22%, respectively and has outperformed all other algorithms with high recall and accuracy rate, which perfectly aligns with our goal of building a model with a low number of false negatives and maintaining a good balance between precision-recall trade-off. The confusion matrix of the model trained using the random forest algorithm is shown in Fig. 3.

Fig. 3
A 2 by 2 confusion matrix of true versus predicted labels in gradient shades. True and predicted demented is at 17, true and predicted non-demented is at 16, true non-demented and predicted demented is at 4, and true demented and predicted non-demented is at 1.

Random forest model’s confusion matrix

Hence, random forest is selected as the best classifier to build a predictive model that classifies a case as demented or non-demented.

4 Future Enhancements

Alzheimer’s disease can occur due to various reasons that have not been clinically diagnosable till date. Therefore, it is important that the model is highly generalized, fitting a vast population of data. Though the accuracy of the proposed model is quite good, it can be enhanced by overcoming the present limitations of the project.

  • Our study is restricted to a small population. Increasing the size of the dataset improves the predicting capability of the model by learning more patterns.

  • In our model, all the features are equally weighted. Differential weighting based on the influence of each feature improves the model.

  • Finding and including a broader set of relevant features also adds to models’ performance.

5 Conclusion

In this project, various machine learning techniques were tested for their potential to efficiently support the prognosis of Alzheimer’s disease. The proposed model serves as an accurate tool for initial screening for further medical diagnosis. The proposed framework learns the patterns of diagnosis of people at risk of Alzheimer's disease with the help of significant features imputed with mean and uses a random forest classifier that provides the highest classification accuracy of 86.84% over all other classifiers, to automate the early diagnosis of Alzheimer’s disease by classifying the instances as demented or non-demented.