Keywords

1 Introduction

Alzheimer’s Disease is a catastrophic and fatal neuro disorder that will lead to the death of neurons and slowly tear down the cognitive capabilities. It’s a memory decaying disease not because of normal ageing, but because of the deposits of amyloid plaque proteins in the brain cells. The condition can affect people of any age after 40, but it’s evident in aged people more vigorously. It’s a type of dementia and causes problems with thinking power, memory skills and behavioural pattern. The indications of AD usually develop sedately and aggravate in their later stage. Compared with other neurological diseases, AD has been a considerable cost burden in Asia, North America, and worldwide [1].

According to research studies, it is calculated that by the year 2050, the count of affected people can be .64 billion [2]. The estimations show that by 2050, one out of eightyfive people worldwide can affect by AD [3]. Pinpointing this particular disease at its beginning stage makes a remarkable difference in the patient’s life expectancy. For indications of AD at its dawn stage, biomarkers have a crucial role, as they can measure what is happening inside the body through image results or laboratory test results. Image tests are the significant biomarkers for the identification of AD and classifying its current stage. Recently applying image tests are Cerebrospinal fluid (CSF), magnetic resonance imaging (MRI), single-photon emission Tomography (CT-scans), positron emission tomography (PET), and the electroencephalogram (EEG) signal [4].

The temporal part of the brain is the primary focus area for identifying AD at its dawn. The hippocampus and cerebral cortex of the temporal lobe are responsible for memory, learning, emotions and feelings, language processing sensation etc. Progression of Alzheimer’s Disease can lead to thinning of the cerebral cortex and loss of volume in the hippocampal area. The biomarkers are the tool to find the changes in these two areas and help the practi 25 tioner diagnose the disease. The empowered, intelligent technologies have brought noticeable changes in the health care sector [5].

The machine learning approach has shown a clear-cut advantage in diagnosing various diseases by analyzing medical data. Researchers have found that machine learning techniques help in diagnosing and classifying Alzheimer’s Disease. The most widely used classifiers are SVM, KNN, Decision Tree algorithm, and Naïve Bayes, which is helpful in clinical diagnosis.

The proposed work plan is to analyze brain MRI data using Scattering Wavelet transform with a Machine learning model for detecting and classifying Alzheimer’s Disease. Machine learning research with neuroimaging datasets has become an indomitable diagnostic pinpoint for the brain’s segmentation and classification of brain MRI. The person affected with AD’s hippocampal area will start to shrinks and enlarge ventricles of cortical thickness [6]. These areas are considered as the broadcast between the brain and the human body. The AD makes a cease between neurons and synapses as they can’t communicate anymore and leads to severe cognitive inabilities. [7, 8] and the MRI could highlight the affected area in low- intensity. Figure 1 shows a set of brain images affected by AD and its different stages.

Fig. 1.
figure 1

Different brain MRI to depicts the different phases of AD: (a) Healthy Control (b) very mild AD (c) Mild AD (d) Moderate AD

The variations in MRI helps to distinguish Alzheimer disease from healthy controls. But only an extensive in-depth knowledge with experience can indicate a healthy subjects MRI from an early AD MRI. Furthermore, a productive robust automated computer-aided machine learning model can provide researchers, scientists, and medical practitioners with immense help in diagnosing and classifying the disease [9]. And also, it ultimately could be a timely assist to medical personnel for providing proper treatment to AD patients. In the proposed work, four machine learning models are selected to identify different stages of the disease. The experiments evaluated the model’s performance on the Alzheimer Disease Neuro Initiative (ADNI) database [10], and clinical datasets which provides T1-weighted MRI scans.

The paper is composed in the following manner: Sect. 2 describes the summary of related works. Section 3 and 4 explain the dataset and the feature extraction technique used in our work. Section 5 illustrates the workflow model and the experimental methods used in the paper. Section 6 is about the performance measures used to evaluate the model, and Sect. 7 discusses the results acquired. Finally paper ends with a conclusion followed by references.

2 Related Work

In medical imaging, the detection of AD through MRI is relevant in maintaining a person’s cognitive capability and for consideration of public health. AD is considered a public health issue, and research indicates that 5.5 million crowd in the age bracket 75–90 years in the United States of America are AD patients [11]. AD causes problems with human memory skills and thinking capabilities, and other daily life activities. It’s a slowly progressive disease, which destructs the nerve cells, and eventually, the person loses control over their life as it’s gets affected severely on all the parts of the brain [12]. Imaging techniques are powerful tools for identifying AD symptoms in the brain. The collaboration of these images with machine learning techniques can be performed more efficiently in the early diagnosis of AD compared with manual systems. In [13] focused on Alzheimer disease early detection with MRI using image processing techniques. Brain atrophy is analyzed and computed through Wavelet, Watershed, K-Means algorithms, a valuable measure for diagnosing the early stages of AD. The paper [14] introduced AD detection techniques using gradient-echo plural contrast imaging (GEPCI) with MRI, as it can find the brain’s damaged tissues due to AD. In the GEPCI method, the affected area gets enhanced resolution in MRI; it’s a supporting tool for the practitioner to identify the affected tissues.

The paper [15] developed a method to diagnose AD by using a probabilistic neural network (PNN) with imaging techniques such as MRI, PET etc. In the primary stage, the brain volume and atrophy rate are computed, followed by feature extraction for the classification purpose. PNN surpasses SVM and KNN in terms of performance.

In the paper, [16] introduced a method of AD detection using MRI scans. This paper has applied a PCA called Principal Component Analysis and Singular value decomposition methods to extract the feature from the images. Classification is achieved by fusing SVM and Decision Tree algorithms to acquire remarkable result. The paper [17] have introduced a framework called multi-textural (MTL) for extracting features. In this method, MTL computes Structural information from MRI. The researchers have performed various texture grading processes for the extracted data through fusion methods. This novel method has shown significant results in performance.

In this paper [18] feature extraction was carried out through wavelet entropy and Hu moment. The classification has been done through SVM with extracted proximal eigenvalues. The accuracy rate of classification has increased with the radial base function (RBF) of SVM. In [19] MRI images undergo Voxel preselection and Brain Parcellation before inputting the networks. Voxel preselection is to erase the voxels with less significance and to diminish computational cost. The brain parcellation is to split the brain region into patches with the help of AAL (Automated Anatomical labelling). The paper developed two different Deep Belief Networks (DBN) alternatives. The first one is DBN-voter, which means an ensembling of DBN classifiers with a voting strategy. Four voting techniques have been used for the prediction process: Majority Voting, Weighted Voting, Classification fusion using SVM, and Classifier using a DBN. The second one is feature extraction using DBN and SVM called FEDBN-SVM. The training samples are extracted voxels, which undergoes a fusion process by SVM, and it will compute the feature’s relevance on each stage and cut down the urgency for the voting phase. Both the networks showcased significant results in prediction and classification.

3 Dataset

Data applied for our work combines the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database and clinical datasets. The ADNI initiative aims to develop indicators for the timely diagnosis of Alzheimer’s disease. It collects clinical, imaging, biochemical, and genetic biomarkers to detect and classify AD. It has launched by the year 2004 under the guidance of Dr Michael W. Weiner, MD [20]. This work has considered a whole set of 3209 Healthy Controls (HC), 2240-Very Mild (VM), 896-Mild (M) and 64 Moderate (Mo) participants for identifying different stages of AD among the early onset category (see Table 1 for demographic details. The dataset contains patient’s demographic information such as age, gender, handedness.

Table 1. Dataset demographics for ADNI.

4 Feature Extraction

Wavelets are mathematical functions. It divides the data into frequency components and resolves the components based on their matching scale. Wavelet Scattering is a set of wavelets capable of achieving linearization and translation invariance of small diffeomorphisms. It can preserve and achieve stable deformation, to make an effective classification. Wavelet Scattering is an effective tool for feature extraction and accurate data representations. They can be used with most classification algorithms to acquire higher performance. The wavelet scattering transform can extract reliable features of the input data that can be used in conjunction with deep neural networks. Wavelet scattering transform has three primary operations: convolution, nonlinearity and averaging described in Fig. 2, with an input signal \(Y=[y1,y2,y3. . . .yn]\). It is 30-dimensional vector data. The main two components of the Wavelet Scattering Transform (WST) are wavelets and scaling function. The first step is to calculate the convolutions and makes the translation covariant. Then the scattering transform computes the non-linear invariants with the help of modulus and low pass filter.

Fig. 2.
figure 2

Wavelet scattering transform processes for the input signal Y

In this paper, scattering wavelet transform is used for feature extraction from the MRI images. The process is achieved by extracting the given input y from a lower frequency to a higher frequency level. Such as \({{w}_{y}}=y*{{\varphi }_{i}}\left( u \right) \) is the lower frequency information of the image y with scaling factor i and \({{w}_{y}}=y*{{\psi }_{i,d}}(u)\) is the higher frequency information. Where i is the scaling factor and d is the direction. Here \({{\varphi }_{i}}(u)\) is the scaling function, and \({{\psi }_{i,d}}(u)\) is the directional wavelet form. The transformation can be expressed as:

$$\begin{aligned} {{w}_{y}}=\left[ \begin{matrix} y*{{\varphi }_{i}}\left( u \right) \\ y*{{\psi }_{i,d}}\left( u \right) \\ \end{matrix} \right] \end{aligned}$$
(1)

Even though wavelet transform can extract the features with higher frequency from different directions, it doesn’t have the property of translation invariance because of the convolution operation of the wavelet transform. [21,22,23,24]. But can achieve it by multiplying non-linear operations called the modulus of the high-frequency coefficients \(|y*{{\psi }_{i.d}}(u)|\) and with a low pass filter \({{\varphi }_{i}}(u)\) to get the invariant feature such as

$$\begin{aligned} |y*{{\psi }_{i,d}}(u)|*{{\varphi }_{i}}(u) \end{aligned}$$
(2)

The obtained feature is invariant from multiple directions in multiple scales but with losing its high frequency. To recuperate the lost frequency apply wavelet decomposition and also achieve stability in the feature coefficients continue the low pass filtering and modulus operations, and the result is

$$\begin{aligned} ||y*{{\psi }_{i.d}}(u)|*{{\psi }_{i+1,d}}(u)|*{{\varphi }_{i}}(u) \end{aligned}$$
(3)

The increase in iterations produces more invariants. In this work, the given input image normalized into a size of 258 \(\times \) 258. By applying a scattering wavelet with scattering level \(n=1, d=8\) is the scattering directions at each scale and wavelet decomposition \(I=2\). So the wavelet scattering can be expressed as

$$\begin{aligned} {{w}_{y}}=\left[ \begin{matrix} {{T}_{n}} \\ {{S}_{n}} \\ \end{matrix} \right] \end{aligned}$$
(4)

The subscript \(`n'\) of the Eq. (4) represents the levels of scattering wavelet, it shows the count of scattering wavelet decomposition to achieve the high frequency. The scattering propagation operator \({{T}_{n}}\) with results

$$\begin{aligned} {{T}_{n}}=||y*{{\psi }_{i,d}}(u)|......*{{\psi }_{in,d}}(u)| When-(n \ge 1); {{T}_{n}}=0;When-(n \ge 1) \end{aligned}$$
(5)

and the scattering coefficient

$$\begin{aligned} {{S}_{n}}={{T}_{n}}*{{\varphi }_{i}} \end{aligned}$$
(6)

Scattering wavelet transform can obtain a robust representation of MRI Images by minimizing the noise features. Also, it maintains the discriminability between the affected and normal regions. Even though wavelet scattering and Convolutional Neural Network structure are alike, they can be distinguished through their filter weights and computational complexity. In the wavelet, the filter weights are known and fixed. The coefficient’s energy decreases in a scattering network as each layer’s level increases; the model used two scattering networks for feature extraction to overcome the scenario. It also reduces the complexity in computation significantly [25,26,27,28].

5 Experimental Methods

In this work, the classification of different stages of AD is achieved with four machine learning models, such as SVM, KNN, Decision Tree, Naïve Bayes. In addition, the papers aim to investigate the accuracy of a multiclass classifier for the disease classification. Figure 3 illustrates the model’s operational mechanism.

Fig. 3.
figure 3

Workflow diagram for AD categorization

5.1 Support Vector Machine (SVM)

The SVM is a well-known method for classification and regression. Its usage has been incredible and beneficial for many applications, especially in the medical field for disease prediction. Using non-linear or linear kernel functions, SVM maps predictor vectors to high-grade dimensional planes and classifies the data with a hyper-plane. Consider n training datasets for a classification problem can be expressed as \(Y= {(s1,t1),(s2,t2). . . .(sn,tn)}\), where si \(\pounds \) \({{R}^{d}}\) feature vectors and ti the class label. [29, 30]. For multi-class classification, the problem gets divided into multiple binary classifications called one vs one, such as ti \(\pounds \)(−1, 1). One vs one process divides the data points in class ‘y’ and rest. The no: of classifiers required for this approach defines by the formula \(m*(m-1)/2\), where m is the no of classes. The classification gains accuracy by maximizing the margin and can be expressed as \({{\min }_{w,b}}({{W}^{T}}W/2)\). SVM’s resolution function is denoted by the following notation: \(f(y)=sign({{W}^{T}}y+b)\), where W is the weight vector, y is the input, and b is the bias.

5.2 K-Nearest Neighbour

KNN is a classifier algorithm; its calculation is based on nearest neighbours for prediction instead of developing a model. It’s also called as a lazy learner. The KNN Prediction analysis technique locates the data samples’ k-nearest neighbours. The distance between neighbours is calculated with the Euclidean function, which helps to find similarities between the two points. A given data \(D=(x1,x2,. . . .xn)\) are the set of labelled data samples. The nearest neighbour classifier assigns test point ‘C’, the label associated with its closest neighbour in D. The K-nearest neighbour classifier classifies C by assigning it the label most frequently represented among the most immediate samples. Here, X is the algorithm’s crucial parameter. The distance between training points xi and testing data points x is computed using Euclidean distance \({{d}_{E}}=(x,{{x}_{i}})=\sum _{i=0}^{n}|x-{{x}_{i|}}|\) [31, 32].

5.3 Decision Tree

The decision tree algorithm divides data samples based on a determined parameter. It maximizes the output by separating the data and sets output in a tree structure. It is used for solving both regression and classifications problems. This method forms a binary tree, with the nodes called decision nodes and classification or leaf nodes [33]. The algorithm tries to find the top categorical and numeric node for the feature split based on the conditions provided in the data. One of the significant challenges in decision tree is identification of the attribute for the feature split [34]. The attribute selection can achieve through Entropy or Gini Impurity methods. The Gini Impurity calculated as \(\sum _{j=1}^{m}fi(1-fi)\) where m corresponds to the set of distinctive labels and frequency fi is for the label y at the given node, and for Entropy, it calculates by \(\sum _{j=1}^{m}-fi\log (fi)\).

5.4 Naïve Bayes

Naïve Bayes is an active and efficient learning algorithm for machine learning. It is one of the recommended classification algorithms among the other classification methods due to its solid independent assumption. It’s a supervised learning algorithm influenced by the Bayes theorem [35]. Based on the set of known probabilities, an attribute finds the occurrence of the particular feature of the attribute. Bayes theorem calculates an event’s probability with the probability of an already occurred event. Mathematically it is stated as: \(P(M/N)=P(M)*P(N/M)/P(N), where\, P(M|N)\) is a posteriori probability of N, P(M) is the prior probability of M.

6 Performance Measures

The performance measure is used to evaluate the model’s performance in classification. The parameter is calculated with a confusion matrix, which will visualize the performance of each model for binary and multiclass classification [36]. True Positive translates the model prediction into true and positive values. False Positive refers to a situation in which the model predicts a positive result but the actual value is false. False Negative refers to a situation in which both the model prediction and the actual value are negative and false. The final one is True negative, which indicates that the model’s prediction is incorrect and the actual result is correct. The performance metrics based on positive and negative values can calculate as

$$\begin{aligned} \text {Accuracy =}\left( \frac{\text {TP+TN}}{\text {TP+TN+FP+FN}} \right) \end{aligned}$$
(7)
$$\begin{aligned} \text {F1-Score =}\left( \frac{\text {2}\times \text {TP}}{\text {2}\times \text {TP+FP+FN}} \right) \end{aligned}$$
(8)
$$\begin{aligned} \text {Recall =}\left( \frac{\text {TP}}{\text {TP+FN}} \right) \end{aligned}$$
(9)
$$\begin{aligned} \text {Precision =}\left( \frac{\text {TP}}{\text {TP+FP}} \right) \end{aligned}$$
(10)

7 Results and Discussion

Our experiment was done in MATLAB 2019b using an Intel i7 processor and 16 GB of RAM. The proposed method has used the combination of scattering wavelet transform for extracting features and machine learning techniques used for classifying the disease stages as non-demented, very mild AD, Mild AD and moderate AD. MRI images were normalized, extracted and reduced, then trained on selected Machine learning models as SVM, KNN, Naive Bayes, and Decision tree using 10 fold cross validation for training and testing. The model has used Gabor wavelets for the wavelet decomposition, and its lowpass filter \(\phi \) is a Gaussian function. The input image with a size of 258 \(\times \) 258 and an invariance scale of 129. The constructed network is a two-layer network to retain the coefficient energy. The number of rotations per wavelet per filter bank in the scattering network is fixed with 8 for both layers. The scattering coefficients are downsampled on a time basis with a lowpass filter and to acquire an 8- time window for each scattering path. This continuous sequence of convolution, non-linearity, and pooling enables the scattering wavelet network to extract characteristics from brain MRI.

The SVM is a commonly used machine learning approach for classification of binary and multi-class data. The Process of non-linear mapping data to a high dimensional space with a kernel function achieves an optimized classification plane when separating the individuals separately. The research work used the radial basis function (RBF) kernel. Figure 4 represents the confusion matrix of the SVM classifier, and it has given a precise match with its true value and predicted value. SVM with RBF has produced a prediction accuracy rate with an average of 98.10%, and its depicted in Table 2.

Fig. 4.
figure 4

SVM confusion matrix obtained for ADNI dataset

The second machine learning classifier used in the research work is K - Nearest Neighbour (KNN) algorithm. In this process value of k is fixed as 1. Figure 5 illustrates the confusion matrix obtained by the KNN classifier. The classifier matched the true value and predicted value to a higher one in classifying moderate and mild but lack in very mild. KNN acquired an average accuracy of 96.3%. The third method used for classification is the Decision Tree (DT) technique, which creates the model in the form of the tree structure, the algorithm used in DT is the C4.5 algorithm. It uses the Information Gain mechanism to construct the tree such that it allows computing the degree of information acquired during the classification process. Figure 6 describes the confusion matrix with the classifier Decision Tree. It shows only an average matching with true value and predicted value, as it couldn’t classify accurately for mild and very mild stages of AD. The classifier classified the phases of AD with an average accuracy of 89.0% and its depicted in Table 2. The final classifier used under the machine learning technique is Naïve Bayes’. It is a probabilistic method; The Bayes network tries to trace data edges of the given distribution. The details obtained are analysed in the classifier based on the conditions provided on the testing data. The testing data are classified to where the classifier algorithm attains the maximum probability. Figure 7 showcases the confusion matrix with Naïve Bayes. The classifier lacked in predicting accurately for all the stages, and it could achieve only an average accuracy of 88.7%. The classifiers accuracy comparison is highlighted in Table 2.

Fig. 5.
figure 5

KNN confusion matrix obtained for ADNI dataset

Fig. 6.
figure 6

Decision tree confusion matrix obtained for ADNI dataset

Fig. 7.
figure 7

Naive Bayes confusion matrix obtained for ADNI dataset

Table 2. Average accuracy obtained for the AD classification.
Table 3. Model performance measures obtained for the AD classification

The performance metrics for the multi-class classification with SVM, KNN, Naïve Bayes, and Decision Tree are depicted in Table 3. The study indicates that the SVM model stands out in the classification of AD stages in comparison with other classifiers. The SVM could achieve an average performance rate for Accuracy of 99%, Precision rate of 98.25%, Recall with 98.5% and F1 score of 98.5%. The SVM revealed its capability again in the image classification with a higher performance rate. In this work, the SVM has opened a pavement towards the insight of medical health care applications, especially with medical images. The model used ten-fold cross-validations for training the SVM with features extracted by scattering wavelet transform. Usage of scattering transform with SVM developed stability to deformation. It’s a powerful feature in image classification and also for obtaining high accuracy.

To evaluate the suggested method’s efficiency, the proposed methods’ results are quantitatively compared to the most advanced findings and its result, as shown in Table 4.

Table 4. State of the art recognition accuracy for ADNI dataset.

8 Conclusion

The recent evolution in biomedical engineering shows that the analysis of medical images is a significant area of research work in the current arena. Applying a Machine learning algorithm in analysing the different types of medical images is one of the core reasons for the evolution. In this work, the paper used four machine learning techniques viz SVM, K-NN, Naïve Bayes, Decision Tree for analyzing and classifying the stages of Alzheimer disease on the early onset. As a conclusion of the results, it can be inferred that SVM-based models are suitable for prediction and classification owing to their robustness.

The results of this study demonstrate that scattering wavelet transform with SVM model outperforms other classification models when performance metrics are varied. While the bulk of studies are concerned with binary classification, the suggested work is concerned with multi-class classification. The new study is useful for the early detection of Alzheimer’s disease in individuals aged 40 to 65. This category of people who are affected by AD has a significant impact on their life expectancy. Moreover, this research work can classify the disease into four stages as very-mild, mild, moderate and healthy controls. Though the work only concentrated on the AD dataset but can assure that the models can work successfully on the prediction and classification in other medical domains. In the future, the model can work on predicting the time gap required to convert from very mild stage to moderate stage, as it could help the practitioner and the patients follow up the proper treatment.