Introduction

Neurodegenerative diseases are a collection of neurological disorders in which neurons in the central nervous system die or are harmed, causing significant disabilities and, in the worst-case scenario, death. Figure 1 shows the major neurological disorder, which affects all age groups. Typically, they are stumbled upon in elderly age. However, the development of the disease could occur sooner. Their prevalence has risen dramatically in recent years, and this trend is projected to continue. As the world’s population ages, the trend will continue. Neurodegenerative illnesses are difficult to manage and can be costly, since the cause is unknown and there is no recognized cure. Currently, treatments are focused on reducing symptoms.

Fig. 1
A circle chart of neurological disorders with Alzheimer's disease, Parkinson's disease, traumatic brain injury, Lewy body disease, and dementia.

Neurological disorders

The use of machine learning algorithms in medical and scientific research has gotten a lot of attention in recent years. New technologies have made it possible to rapidly accumulate patient data, such as ultrasonography and MRI readouts; omics profiles of biological samples; electronically collected clinical, behavioral, and activity data; and social media-derived information in the last decade. The number of characteristics (or variables) recorded in each observation can occasionally surpass the total number of observations in these large health datasets, making them high-dimensional. Machine learning advancements can be of great assistance to diagnosis and monitoring, such as illness onset detection, disease characterization, and disease improvement quantification of differential diagnosis.ML approaches have been used to assist physicians with CAD to diagnose the diseases [1, 2].

Classes of machine learning algorithm:

  • Supervised learning

  • Unsupervised learning

  • Reinforcement learning (Fig. 2)

Fig. 2
A flow diagram of machine learning, which is divided into supervised learning, unsupervised learning, and reinforcement learning, along with its corresponding examples.

Divisions of machine learning algorithm

Supervised Learning

These approaches are most typically used to analyze data linked to neurodegenerative diseases. Dataset is required to train the model. Often, these labels necessitate human curation or professional evaluation; for example, a radiologist is required to label a series of radiographs, the size of a brain area on an MRI scan, and the labeling of MRI scan pictures, and a neuropathologist is necessary. The computer may then use this model to predict the label for fresh, unlabeled datasets using the new input characteristics. It can be difficult to collect big enough volumes of correct labels for supervised machine learning. Figure 3 shows the supervised learning model.

Classification is a process that includes finding rules in which new objects are assigned to a predefined category, and it requires two steps. The first step is the building of classification with the help of training dataset that has class attributes. The second step is to analyze the performance of the classification using test dataset. The supervised process is in which prediction is iteratively performed using training data [24, 25]. In supervised learning, some of the commonly used algorithms are decision tree, linear classification, and Naive Bayes.

The dataset must first be preprocessed, which involves a task that transforms the raw data into a finished dataset. It can be fed into the data model. The data cleanup, data transformation, feature selection, and feature mapping stages in the preprocessing phase all contribute to the classifier’s accuracy. Data cleaning is a step that looks upon any missing value in the dataset, and data transformation ensures that the data values fall in a small range. The feature selection is one of the significant steps that limit finite features, which can provide a higher detection rate, and the final step is the feature mapping, which helps in finding a decision region. This can maximize the class separately between the given classes in the newly mapped space. In the training phase, it practically supplies at least 70% of the dataset to create a model that can categorize and predict the right label of the testing set. Using data evaluation is we can analyze the performance of the algorithm in the neurological disorder dataset. Data evaluation includes accuracy sensitivity specificity and error rate. Many literatures suggest that using a supervised algorithm for neurological disorder classification, it was found that it has low specificity, about 0–54%, and the accuracy of the decision was predicted to be nearly 80% only.

Fig. 3
A flow diagram depicts how to evaluate the result from the neurological disorder dataset using data preparation, dimension reduction, feature selection, the classifier algorithm, and classification.

General block of supervised learning model

Decision Tree

The decision tree (DT) is one of the most well-known and widely used machine learning methods. It is a mimic of the human thinking ability to make a decision. It’s a tree-structured decision-making technique. The root, leaf nodes, branches, and internal nodes are the four basic components. The root of a tree connects its classes, with leaf nodes representing classes, branches representing outcomes, and internal leaves representing processes. The classification rules are the paths from the root to the leaves, and the steps involved in predicting the class of the given dataset are given below. Figure 4 shows the basic flow of the decision tree algorithm [31].

  • Step 1: It starts from the root nodes of the tree which contains the complete dataset.

  • Step 2: Select the attribute/feature value for the root node. Create a branch for each possible attribute/feature value.

  • Step 3: Generate the decision tree which contains the best attribute.

  • Step 4: Repeat recursively for each branch, and make a new decision tree using the subset of the dataset created in step 2.

Fig. 4
A tree diagram of the root node, which has two branches of internal nodes. Each internal node has two branches of leaf nodes.

Basic flow of decision tree algorithm

The process is continued until a stage where you cannot further classify the nodes and call them as final node/leaf.

Figure 5 explains the use of the decision tree algorithm for the image dataset for analyzing the brain image. Neurological disorder image datasets can be collected from various datasets (e.g., OASIS, TOF-MRA, Allen Brain Atlas, Alzheimer’s Disease Neuroimaging Initiative (ADNI), the fMRI Data Center, etc.,). The images were then divided into two groups: testing and training. After collecting the results, the features of the image were extracted using a suitable technique, and the testing was carried out using a decision tree algorithm (C4.5, ID3, CHAID, etc.,). Figure 6 shows the output of lesion detection in the brain MRI image using decision tree algorithm.

Fig. 5
A flow diagram illustrates how the performance result is obtained from the input image dataset with training and testing data using feature selection, a decision tree algorithm, and validation.

Decision tree algorithm for image analysis

Fig. 6
Three greyscale reports of the human brain, which have a faded outer layer with a dark area in the middle, and highlighted lesions are detected using a decision tree algorithm.

Decision tree algorithm for lesion detection

Linear Regression

Linear regression was first developed in the discipline of statics to aid in the understanding of the relationship between numerical input and output. A linear model is a linear regression. It assumes that the input variable (x) and output variable (y) have a linear relationship (y). Simple linear regression and multivariate linear regressions are two separate forms. Simple regression is an approach in which the response prediction is made using a simple feature. The linear regression algorithm’s purpose is to discover the optimal values for b0 and b1, such that the best fit line can be found. Figure 7 shows the general plot of the linear regression model [32].

$$ y={b}_0+{b}_1{x}_1 $$
(1)

When multiple features are used, it is called as multiple linear regression

$$ y={b}_0+{b}_1{x}_1+{b}_2{x}_2+\dots +{b}_n{x}_n $$
(2)
Fig. 7
An X-Y plot of the linear regression. It plots an increasing slope along with the data points that provide data for the regression lines with an intercept.

Plot of linear regression

By filtering a linear equation to the observed data, multiple linear regression aims to model the relationship between one or two features and response. Regression analysis formula

$$ y= mx+b $$
(3)

Regression lines are used to predict the values of “y” (dependent variable) for a given value of “x” (independent variable). The best-fit regression line times to reduce the sum of squared distance between observed (actual) and forecasted data points. The intercept of regression lines in estimating the value of “y” (dependent variable) when “x” (independent variable) has no influence.

Figure 8 [34] shows the output of the linear regression model for analyzing the left hippocampus of the functional MRI dataset; Figure 8b shows the linear model analysis of the BOLD signal in the image.

Fig. 8
An M R I image of the human brain, which has bright spots, and a graph of contrast estimate versus vividness reduction that plots an increasing slope along with the data points of the BOLD signal.

(a) Linear regression analysis of the fMRI data (b) Linear analysis of BOLD signal

Logistic Regression

One of the supervised machine learning algorithms that may be used to model the probability of an event or class is the logistic regression model. It is used when the data is separable linearly and the outcome is binary. In binary classification, there are primarily two types: one is logistic regression, which is a single independent variable to predict the output, and the other one is multiple logistic regression, where multiple independent variables are used to predict the output [33]. Logistic regression models the data using the sigmoid function for single logistic regression which is given as

$$ e=1/\left(1+{e}^{-\left(b0+b1x\right)}\right) $$
(4)

Logistic regression models the data using the sigmoid function for multiple logistic regression which is given as

$$ e=1/\left(1+{e}^{-\left(b0+b1x+b2x2+\dots bnxn\right)}\right) $$
(5)

The probability of occurrence of a neurological disorder can be obtained by using this algorithm with high accuracy. The input image is preprocessed, the feature is extracted using a suitable algorithm, and the image is divided into training and testing data. Then, the logistic regression predicts the disorder based on the obtained features. Figure 9 shows the flow diagram of logistic regression [34].

Fig. 9
A diagram presents how the input signals with their corresponding weights are added with bias and processed through the sigmoid and prediction.

Flow diagram of logistic regression

Naive Bayes

The Naive Bayes algorithm is one of the simple and most effective supervised classification algorithms. It helps in building fast machine learning that can predict rapidly. It is an object probability-based probabilistic classifier, where naïve assumes the occurrence of a certain feature independent of the occurrence of the other feature and Bayes, as it depends on the Bayes principle: \( P\left(y|X\right)=\frac{P\left(X|y\right)P(y)}{P(X)} \), where y is a class variable and x is a dependent feature vector (of size n), which is given as X = {x1,x2,x3….,xn}. Apply Bayes’ theorem to the given dataset in the following way: Now, if any two events A and B are independent, then P(A,B) = P(A)P(B). Then; the final result will be given as

$$ P\left(y|{x}_1\dots, {x}_n\right)=\frac{P\left({x}_1|\textrm{y}\right)\ P\left({x}_2|y\right)\dots p\left({x}_n|\textrm{y}\right)}{P\left({x}_1\right)P\left({x}_2\right)\dots P\left({x}_n\right)} $$
(6)

It can also be rewritten as

$$ P\left(y|{x}_1\dots, {x}_n\right)=\frac{P(y):\Pi \left(I=i\dots n\right)P\left({x}_i|\textrm{y}\right)}{P\left({x}_1\right)P\left({x}_2\right)\dots P\left({x}_n\right)} $$
(7)

We can now eliminate that term because the denominator remains constant for every given input:

$$ \left(y|{x}_1\dots, {x}_n\right)\upalpha \kern0.75em P(y):\Pi \left(i=1\dots n\right)P\left({x}_i|\textrm{y}\right) $$
(8)

It was found the best classification results were obtained by mostly using a Naive Bayes classifier for analyzing MRI images from Parkinson’s and Alzheimer disease [5].

Support Vector Machine

SVM is abbreviated as support vector machine. It is the most widely used supervised learning method for classification and regression; however, it is more commonly used for classification. The goal of the SVM method is to determine the best line or decision boundary for categorizing n-dimensional space into classes, so that subsequent data points can be easily placed in the relevant category. This ideal decision boundary is referred to as a hyperplane. SVM is used to choose the extreme points/vectors that help build the hyperplane. These extreme points are known as support vectors, and hence, algorithm is called support vector machine. The margin is the difference between two lines on the closest data points of different classes [26]. The perpendicular distance between the line and the support vectors can be used to determine the margin. A wide margin is considered a good margin, whereas a tiny margin is considered a poor margin. The training culminated in the creation of a decision surface that splits the space into two subspaces. Each subspace represents a different training data class. Once the training is complete, the test data is mapped to the feature space. These data are then assigned a class based on which subspace they are mapped to. The fundamental goal of SVM is to divide datasets into classes in order to find the maximum marginal hyperplane (MMH), which can be done in two steps [29]:

  1. 1.

    To begin, SVM will construct hyperplanes that best divide the classes iteratively.

  2. 2.

    The hyperplane that correctly separates the classes will then be chosen.

By applying the above steps, a linear discrimination SVM model, which is a binary classifier, divides the space into two classes of MRI images by predicting the hyperplane.

Unsupervised Learning

Unsupervised machine learning techniques are effective for applications like clustering, since they don’t require tagged data. For example, image classification may be studied using clustering methods. Unsupervised clustering, in addition to analyzing existing data, predictions can also be made using algorithms. A model can be trained using the dataset from various data sources.

A diagram of unsupervised learning depicts how the clustering algorithm identifies the patient data from the dataset to obtain the output.

K-Means Algorithm

A popular unsupervised learning approach is the k-means clustering algorithm This approach uses an integer k and n observations. The result is a k-set partition of the n observations, with each observation belonging to the cluster with the closest mean. The operations of k-means are summarized in the stages below.

  1. 1.

    Set up k cluster centers. This can be achieved in practice by selecting k center at random.

  2. 2.

    Points are derived from n observations or k randomly generated center points.

  3. 3.

    Calculate how far each observation is from the cluster centers.

  4. 4.

    Each point should be assigned to the cluster with the smallest distance between its center and the other cluster centers.

  5. 5.

    As the cluster mean, recalculate the placements of the k centers.

  6. 6.

    Once more, compute the distance between each data point and the newly estimated centers.

Steps 3 and 4 should be repeated until all data points have been allocated to the same cluster (data points do not move).

Dataset of individuals with Alzheimer’s (AZ) disease having longitudinal evaluations. The National Alzheimer’s Coordinating Center database, supplying neuropathological data. The AZ disease in the image was clustered using the means clustering technique. The researchers wanted to look at subgroups of patients with different extrapyramidal sign progression trajectories and their clinical and neuropathological correlates [4]. K-means algorithm to cluster the neurological dataset is based on the disease condition using the above steps [3].

Mean-Shift Clustering Algorithm

The mean-shift clustering method is a centroid-based approach that aids unsupervised learning in a variety of situations. It is one of the most effective image processing and computer vision algorithms. It works by relocating data points to centroids, which then become the mean of other points in the region. The mode searching algorithm is another name for it. The benefit of the method is that it distributes clusters to data without automatically specifying the number of clusters depending on bandwidth.

Unlike the K-means cluster technique, mean-shift does not need a priori cluster number specification. In relation to the data, the algorithm determines the number of clusters [5]. For fMRI (functional magnetic resonance imaging) analysis of neurological disorders the temporal properties of the image is taken. Simulated and actual fMRI data were utilized to compare cluster analysis with mean-shift clustering, which uses a feature space that includes temporal and spatial features [6]. Figure 10 shows the flow of mean-shift algorithm used for clustering medical images.

Fig. 10
A flowchart illustrates how the segmented image is obtained from the input using preprocessing, clustering, and merging.

Flowchart of mean-shift algorithm used for medical image classification

Affinity Propagation and Hierarchical Clustering

To split the brain MRI images into multiple clusters, an affinity propagation approach was used. The affinity propagation method uses tissue-segmented and anatomically parcellated pictures to characterize the similarity between brain images [7]. After clustering, a representative exemplar image is found as the single topic atlas for each clustered, and the MRI images fitting to the same subgroup are recognized. Figure 11 depicts the brain MRI image’s tissue segmentation. The algorithm of affinity propagation is:

  • Algorithm works based on resemblances among data points.

  • All data points are considered as the potential cluster center.

  • Cluster centers are identified that represents the dataset.

Fig. 11
Three structures of the human brain illustrate how the tissues are highlighted and split into multiple clusters with a gradient of clusters.

(a) Original image. (b) Tissue segmented image using affinity propagation. (c) Structure parcellation using affinity propagation

Affinity propagation requires two data:

  • Similarities between data points s(i,k)

  • Preferences s(k,k)

  • i represents a data point.

  • k represents the exemplar.

The hierarchical clustering-based segmentation (HCS) is an unsupervised method for separating various sections in image data. The HCS method segments the images by splitting an image into its regions at hierarchical degrees of permitted dissimilarity across the different regions [36]. As the allowable threshold value is increased, the hierarchy represents the uninterrupted merging of similar, adjacent, and disjoint regions [8, 9].

Hierarchical clustering-based segmentation algorithm includes the following.

  1. 1.

    Labeling of each pixel in the image:

    • Label every pixel according to the pre-segmentation if the image has already been segmented.

    • If initial segmentation isn’t possible, label every pixel as a separate section.

    • The amount of distinction between areas that can be allowed should be set to zero.

  2. 2.

    Dissimilarity value is calculated among in the given image.

    • The least dissimilarity value should be used as the threshold value.

  3. 3.

    If the threshold and dissimilarity value are equal, then merge all the values close to the dissimilarity value or else move to step 6.

  4. 4.

    If the regions merged in the above step is greater than 0, reclassify the pixels on the boundary of the combined areas with the remaining regions until no further classification is done.

Save the region data for this iteration as an intermediate segmentation among the combined regions after all accessible border pixels have been classed, and move on to step 2.

Otherwise, move to step 5, if the areas merged in step 3 is equal to 0.

  1. 5.

    Move to step 7 if the number of regions in the image is less than the predetermined value, otherwise go to step 6.

  2. 6.

    If the current dissimilarity permitted value is well below the maximum permissible value, go to step 2 and gradually increase the dissimilarity allowed value.

    If not, go to step 7.

  3. 7.

    Save the data from the current iteration.

Figure 12 [35] shows the output of the hierarchical clustering algorithm of brain image by applying the abovementioned procedure.

Fig. 12
The structure of the human brain has bright colors in the middle and is surrounded by dark colors.

Hierarchical clustering for brain image

Density-Based Spatial Clustering (DBSC)

Density-based clustering is a term used to describe unsupervised learning approaches for identifying unique groups/clusters in input images. DBSCAN is a density-based clustering base technique. It can detect clusters of various shapes and sizes in a large amount of noisy data including outliers [10].

Two parameters are used in the DBSC algorithm:

  • minPts: The minimum number of clustered points required for a region to be considered dense (a threshold)

  • eps (): A distance metric that may be used to find points in the vicinity of a given location

  • The algorithm selects a point in the image dataset at random.

  • If there are at least “minPoint” points within a radius, we consider it to be part of the same cluster.

  • The clusters are then extended, by repeating the neighborhood computation for each nearby point.

Static and dynamic functional connectivity of the fMRI images can be analyzed using the DBSC algorithm by applying the above procedure [11, 12].

Gaussian Mixture Modeling

Histogram thresholding is one of the most often used methods for segmenting pictures, and Gaussian-mixture-based segmentation is based on it. The assumption in histogram thresholding is that an image has two areas or classes: target and background, each with a unimodal gray-level distribution. As a result, the segmentation problem entails selecting an appropriate threshold for partitioning the picture into target and background regions. The probability density function (PDF) of the gray levels in the picture is a combination of two Gaussian density functions with specific means, standard deviations, and proportions in Gaussian-mixing segmentation techniques [12, 13].

$$ P(x)=\sum \limits_{a=1}^b{p}_aN\left(x{\propto}_a{\sigma}_a^2\right) $$

where b is the number of regions

  • pa > 0 -weights, i.e., is \( \sum \limits_{a=1}^b{p}_{a.} \)

$$ N\left({\propto}_a{\sigma}_a^2\right)=\frac{1}{\sigma \sqrt{2 pa}}\exp \frac{-{\left(x-{\propto}_a\right)}^2}{2{\sigma}_i^2} $$

where

  • a – mean of class a.

  • \( {\sigma}_a^2 \) – standard deviation of class a.

Figure 13 [35] shows the input MRI image and Gaussian output image [14]. White and gray matter in the brain MRI image can be classified to analyze neurological disorders, like attention deficit hyperactivity disorder (ADHD).

Fig. 13
An M R I image of the human brain with dark and faded colors on the left and the gaussian image has highlighted tissue on the right.

(a) Input MRI image, (b) Gaussian image

Convolutional Neural Network (CNN)

CNN stands for convolutional neural network and is a sort of deep learning neural network. In short, consider CNN to be a machine learning system that can take an input image, assign relevance (learnable weights and biases) to various aspects/objects in the image, and distinguish between them. Figure 14 shows the block diagram of CNN [15, 30, 35].

Fig. 14
A diagram presents how the input image of the human brain is processed through feature extraction with several convolutional, pooling, and fully connected layers to obtain the output.

Block diagram of CNN

CNN extracts data from images by extracting features.

The following are the components of any CNN:

  • A grayscale image serves as the input layer.

  • The output is a multi-class labeling system.

  • Convolution layers, rectified linear unit layers, pooling layers, and a fully linked neural network are all hidden layers in the CNN architecture.

Convolutional Layer

It has a convolutional filter of size N*M*D, where D is the image depth. N and M are the image pixel. Kernels are convolved over the image region, and the dot product among the filter entries is computed during the forward pass [16].

Activation Functions

For nonlinear transformation of medical images, the following activation functions are used:

  • Sigmoid

  • Tan hyperbolic

  • Rectified linear unit

Pooling Layer

Convolved features are down sampled in the pooling layer. By dimensionality reduction, it reduces the amount of computing power necessary to process the data. It avoids overfitting and minimizes translation and rotational variance of images by aggregating data over space. It also minimizes spatial size by aggregating data across space or feature types. The input to a pooling process is partitioned into a collection of rectangular patches. Depending on the pooling, each value is replaced with a single value. Maximum pooling and average pooling are the two forms [27, 28].

Fully Connected Layer

The fully connected layer is comparable to a deep neural network in that each node has interconnections to all of the inputs and that each link has weights associated with it. The total of all inputs multiplied by the weights yields the final output. The classifier work is performed by the fully connected layer, which is proceeded by the sigmoid activation function. In Table 1, various types of CNN are used for the classification of neurological disorders. The table highlights some widely used CNN networks used for the analysis of disorders.

S. no.

CNN

Developed by

Neurological disorders

1.

Modified LeNet [17]

R.A. Hazarika, A. Abraham, D. Kandar and A.K. Maji

Mild cognitive impairment (MCI), cognitively normal (CN), Alzheimer’s disease (AD)

2.

VGG16 [18]

Jain, R., Jain, N., Aggarwal, A. and Hemanth, D.J.

AD, MCI

3.

AlexNet, GoogLeNet, ResNet50 [19]

Khagi, B., Lee, B., Pyun, J.-Y. and Kwon, G.-R.

AD, healthy control (HC)

4.

ResNet-18 [20]

M. Raza, M. Awais, W. Ellahi, N. Aslam, H.X. Nguyen and H. Le-Minh

CN, significant memory concern (SMC), early mild cognitive impairment (EMCI), mild cognitive impairment (MCI), late mild cognitive impairment (LMCI), and AD

5.

VGG 19 [21]

Bhatele, K.R. and Bhadauria, S.S.

Alzheimer’s disease (AD) and Parkinson’s disease (PD)

6.

U-Net [22]

Fan, Zhonghao, Li, Johann, Zhang, Liang, Zhu, Guangming, Li, Ping, Lu, Xiaoyuan, Shen, Peiyi, Shah, Syed, Bennamoun, Mohammed, Hua, Tao and Wei, Wei

AD, late MCI, early MCI

7.

3D CNN [23]

Yee, Evangeline, Ma, Da, Popuri, Karteek, Wang, Lei and Beg, Mirza Faisal

Stable dementia, stable normal control

Conclusion

Advanced CNN approaches and models in supervised and unsupervised algorithms, as well as advances in high-speed computing techniques, provide a unique opportunity to predict and manage a variety of neurological illnesses, such as Alzheimer’s disease, Parkinson’s disease, and schizophrenia, among others. The most popular CNN models were investigated in this article for diagnosing neurological disorders using MRI, fMRI, and CT scan dataset. The use of CNN techniques to classify neurological illnesses identified in the literature has been described.