Keywords

1 Introduction

Parkinson’s Disease (PD) is a neurodegenerative disorder that is the primary stage of parkinsonism, (an idiopathic chronic disorder) usually predominant in the elderly. Approximately 1% of the world population encounters PD, From which the majority of cases manifested complicated motor and cognitive issues. As the disease starts to develop, a prodromal phase of cognitive and behavioral symptoms inclusive of various personality changes, depressive disorders, memory dysfunction, and emotion dysregulation may become apparent. To date, clinicians still follow conventional methods to diagnose PD based on the prodromal symptoms mentioned above including slowness, stiffness, tremor, and movement difficulties. However, these symptoms tend to vary according to each individual being affected. Hence, at present, there is no particular blood test or biomarker to diagnose PD meticulously or detect prime changes as the condition rises. Researchers have perceived with the provision that changes in disease progression can be detected by brain MRI under a distinctive protocol.

PD in the primary stage can be misdiagnosed as other syndromes. PD is a most remarkable disease to be misinterpreted with abundant atypical parkinsonian disorders (APDs). For instance, progressive supranuclear palsy (PSP), multiple system atrophy (MSA), specifically the Parkinson variant of multiple system atrophy (MSA-P), and corticobasal degeneration (CBD). In addition to this, categorizing clinical entities of PD is difficult. Furthermore, classification of APDs and PD is imperative for accurate diagnosis during the prodromal phase as well as for choosing a specific intervention. MRI, a revolutionary and widespread method used in modern diagnosis, has proven to be significant for detecting PD. In general, MRI images undergo distributional differences in image resolution, contrast, signal to noise ratio due to hardware acquisitions, or techniques used in scanning. All these factors impact the composition of such images, in turn influencing the performance of training models.

In this research, we are adapting Deep Learning methods because it authorizes a higher quality learning representation of MRI image data [18, 21] enabling multiple levels of abstraction, achieved through multiple processing of layers. Thus providing an optimal rate of model performance in comparison to other learning approaches. The dataset selected for this study is a public dataset NTUA (National Technical University of Athens), which is split into training, testing, and validation sets. A DenseNet Model is integrated with LSTM to train the MRI data samples. The use of learning parameters is significantly reduced in DenseNet, as the input in each layer is provided as output in the next layer. Enabling reuse, and smooth flow of prominent features throughout the layer without any loss. Further, within each layer, some mechanism exists that select features according to their temporal relatedness apt to predicting class. These selected features are passed through the LSTM layer where their temporal dependencies are discovered. This exposes the model to more complex temporal dynamics of imaging, enhancing the classification and learning of the model [11]. In addition to this LSTM handles back prop and vanishing gradient problems [3] to enable effective training of datasets. The proposed model is further compared to other CNN state-of-the-art models, namely, DenseNet, VGG19, InceptionV3, ResNet, MobileNet for further evaluation.

Thus, through this section, the motive behind this research is understood. In the following Sect. 2, other works related to this research are assessed. Section 3 illustrated the methodology. Section 4 investigates and evaluates the proposed model. Section 5 outlines the conclusion and future prospects. While Sect. 6 provides acknowledgment.

2 Literature Review

Ref. No.

Methodology

Accuracy

Limitation

  [26]

A CAD-based CNN model for auto-diagnosis of PD is proposed. A T2-weighted Magnetic Resonance Imaging (MRI) data samples are used in trainning the model. The model is compared to other ML models

96%

The complex manifestations of MRI data make it difficult to select the appropriate structure of CNN

  [14]

T1-weighted sMRI scans from 416 patients are included to build the OASIS 3D dataset. A 2D architecture of 3 layers of distinct configuration of CNN is proposed to build the classification system for Alzheimer’s. It is compared with other CNN state-of-the-art models

93.18%

The model may stop working if the vanishing gradient is relatively small, preventing the weights from getting updated

[28]

This is a review study consisting of some experiments with different layers of CNN, CNN-RNN, DNN. It was conducted to survey different types of datasets (image, speech handwriting, sensor, DATscan, MRI) on ML and DL models. Neurodegenerative disorders such as Alzheimer’s and PD were investigated

96% for CNN-LSTM

Retraining the DNN model can lead to unforgettable prior knowledge

  [19]

A classification model is generated to dissimilate PD-induced olfactory dysfunction from other types of non-Parkinsonian olfactory dysfunction (NPOD) from T1-weighted axial MRI samples of 30 patients. A 4 layer of CNN architecture along with other ML models are applied to train the data samples

96.6%

The size of MRI samples are small in order to reduce bias. The proposed model suffers from the problem of overfitting

  [23]

A classification model is generated to dissimilate PD-induced olfactory dysfunction from other types of non-Parkinsonian olfactory dysfunction (NPOD) from T1-weighted axial MRI samples of 30 patients. A 4 layer of CNN architecture along with other ML models are applied to train the data samples

96.6%

The size of MRI samples are small in order to reduce bias

3 Methodology

Figure  1 demonstrated the methodology which is undertaken to carry out this research. In the beginning, the dataset is preprocessed by applying normalization and other perimeters found in data augmentation, followed by segmentation using K-means algorithm. Afterwords, the dataset is split into three parts. This dataset is then fed into the training model. The training model is labeled and used to predict PD from a sample of test datasets.

Fig. 1.
figure 1

Methodology

3.1 MRI Data Samples

This dataset comprises MRI examination report samples from a total of 78 individuals, 55 of which are suffering from Parkinson’s Disease, 23 are healthy and serve as control subjects. This dataset is made accessible for the public, which includes the epidemiological, clinical and paraclinical data sample of patients, and is named as the ‘National Technical University of Athens (NTUA) Parkinson’s Dataset’. This dataset contains T1, T2, and Flair MRI image samples where the frames per sequence and the resolution differ for individual images. Although the images were derived in DICOM format, which is standard in medical imaging, the images Were published in PNG format for efficient storage (Fig. 2).

Fig. 2.
figure 2

Sample of NTUA MRI

The total number of data samples is 1387, of which 472 are NON PD patients and 915 PD patients. The samples were divided into three sets, Test, Training, and Validation samples respectively [28]. The Table 1 represents how the samples are distributed into sets.

Table 1. Distribution of MRI data samples

3.2 Pre-processing of MRI Data Samples

In the beginning the MRI image data samples are preprocessed with an ImageDataGenerator class available in keras. The ImageDataGenerator class provides access to a wide range of pixel scaling methods and data augmentation techniques. The ImageDataGenerator class comprises three prominent methods of pixel scaling procedures. Here, the Normalization method is implied to re-scale the pixel values from a range of 0–255 to a range of 0–1. The rescale argument is invoked in order to carry out normalization of the image dataset before feeding it to the neural network for training [24]. The data augmentation increases the data quality and its robustness. Parameters such as shifting, rotation, shear and zoom, and flip are instigated. Moreover, augmented datasets aid in increasing data points by reducing the length between the testing and training dataset. Hence preventing the overfitting in the training dataset [27].

3.3 MRI Sample Segmantation

Fig. 3.
figure 3

Sample of MRI segmented

The MRI samples are converted to LAB color space. The adaptive K-means algorithm is employed to segment each image by a value of K which falls between a cycle two to ten. Secondly, the morphological operation is applied to transform the image into two values. Lastly, the maximum threshold is registered by iterating the process, to obtain the segmentation outcome. If the number of the outcome matches the value of K in that instant, a stop iteration command is invoked and the outcome of division becomes the final outcome.

3.4 Proposed Model Architecture

Table 2. System architecture description.

DenseNet is a state-of-the-art which is the most suggested convolutional neural networks, because it can join both the previous and updraught layers. The structure offers advantages superior to existing structures like mitigating the vanishing gradient drawback, reinforcing propagation of feature, enables reuse of features, and turn down the number of parameters. In general, deep DenseNets are sets of dense blocks which is associated consecutively, with subsidiary convolutional and operations of pooling allying consecutive dense blocks. This development allows to construct a deep neural network able to constitute difficult transformations [13]. Thus, the task consists of two major obstructions: (i) Invoking CNN to deal with image sequences is inapt as CNN formerly established for static data. And leads to obtaining features from the image. These features are provided to the LSTM as it is capable of dealing with sequence data. Hence it is used to acquire image-sequence differentiation. The real image sequences are noisy and high dimensional, so it provides poor results when fed onto the LSTM Model (Table 2).

3.5 System Implementation

Google Colab was employed to generate the training module [7]. This platform expedites the implementation of programs in run time and supports deep learning libraries by providing access to robust GPU (Graphics Processing Unit) and TPU (Tensor Processing Unit). Python is compatible with the Google Colab environment. To build the programs, libraries obligatory for implementation include Tensor flow, Keras, OpenCV, PIL, sklearn, Matplotlib, Pandas, and Numpy. In this study, Tensor was implied for the backend of the system and Keras was employed to build the pre-trained models of CNN [1] because it can aid in built-in functions for the purpose of layering, optimization, and activation. OpenCV is vital for image processing [25]. Conversely, Sklearn provides access to various supervised and unsupervised algorithms [12]. A confusion matrix was implemented to build Matplotlib [12]. Image processing was carried out by the utilization of an integration tool PIL, whereas Numpy was employed to aid in the operations of arrays [10]. Callbacks are implemented to train the model. The advantage of using Callbacks includes not only overfitting which is the result of the occurrence of numerous epochs as well as circumvents under models. Check points, early stopping and lowering the learning rate on a plateau are utilized in callbacks. Checkpoints aid in the preservation of the best models by inspection of the loss within validation. Early stopping halts the training epochs, once the model exhibits no significant change in model performance. When the validation loss declines for any further enhancement, a Reducing learning rate on the plateau is invoked.

4 Experimental Result

This section evaluates the performance of the models when trained using the MRI data samples for the classification of PD.

4.1 Comparison of Results

After the MRI data samples are pre-processed, they are fed into state-of-the-art CNN models. These models are, namely, DenseNet, VGG19, ResNet, MobileNet, and Inception V3. It is observed that VGG19, ResNet, MobileNet have a training accuracy of 1.0 with very insignificant loss, in contrast, their testing accuracy is 0.81, 0.93, and 0.84. Whereas, these models have a validation accuracy of 0.82, 0.96, and 0.94 respectively. Since the training accuracy is 100%, the model is likely to be an overfitted model. In order to justify this cross-validation needs to be carried out for VGG-19, ResNet, and MobileNet.

Table 3. Comparison of result
Fig. 4.
figure 4

Performance curve of InceptionV3

On the other hand, InceptionV3 has a training accuracy of 0.95, a testing accuracy of 0.84, and a validation accuracy of 0.94. Since there is an 11% difference between the training and testing. Furthermore, it can be said that the model suffers from overfitting. In the case of DenseNet, the model has a relatively close training and testing accuracy as well as shows little or no significant difference between the validation and training accuracy. Thus, this model can be taken into account for further experimenting.

Fig. 5.
figure 5

Performance curve of DenseNet

Fig. 6.
figure 6

Performance curve of VGG19

Fig. 7.
figure 7

Performance curve of ResNet

Fig. 8.
figure 8

Performance curve of MobileNet

Figures  3, 4, 5, and, 6 graphically represents the performance curve of all the state of the art model aplied to train the MRI samples for the classification of PD (Fig. 7).

4.2 Proposed Model Result

Table 3, demonstrates the training accuracy, testing accuracy as well as validation accuracy of the model. It is observed that the model is a well-fit model since there is not any significant differences between, testing, training and validation. Figures  8 and 9 represents the Model accuracy and that Model loss. In Fig. 8 the red line depicts the training accuracy which is observed to increase after 20 epochs while the blue line depicts the validation accuracy. Both the red line and the blue line is observed to meet at a particular point after 30 epochs. On the other hand, in Fig. 9 the blue line represents the validation loss which is slightly less than the training loss represented by the red line .

4.3 Performance Matrix of Various Deep Learning Models

To evaluate each models used in this research, the performance matrices such as precision, recall, f1-score, AUC are used (Tables 4 and 5).

Table 4. DenseNet-LSTM result
Table 5. Comparison of performance matrices

5 Conclusion and Future Work

This research is aimed at the development of a method that will help to determine and differentiate PD with precision as well as without the interference of traditional approaches. To detect PD early, it is crucial to use an application that can segment the data sample image for better visualization and image resolution. With the emergence of various applications, in the modern era, MRI plays an important role as it can aid in the segmentation of images for a vivid understanding of the target affected side, leading to the recognition of PD while in the prodromal stage. To build an optimal learning model, DenseNet integrated with the LSTM model is trained with the publicly available NTUA dataset. This model is deployed for this research because it is better at learning and storing significant features within layers. The LSTM layers benefit by sorting the features according to their dependencies to be more relatable to predictable targets. The performance is further evaluated against other CNN state-of-the-art models in order to find which model is more appropriate for the data sample. However, the downside of Deep learning approaches adhered to in this research is that it is inadequate for in-depth comprehension as these models used are similar to a “Black-box” [17]. So to remove prior intuitive interpretations, explainable Artificial Intelligence can be introduced in near future. Further, this model can be enhanced to classify other syndromes like PD and we use other model to predict PD[?] [2, 4,5,6, 8, 9, 14, 15, 20, 22, 29].