Keywords

1 Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease [3, 28] which adversely affects the central nervous system of our brain that controls our balance, body movement and posture. PD generally occurs with the loss of dopaminergic neurons [16] in the substantia nigra which is a part of our brain. Usually, the symptoms of this disease appear slowly and as it gets worse over time, the non motor symptoms like tremor, slow movement (bradykinesia), rigidity, difficulty in walking, pain, fatigue, restless legs, speech and communication issues, low blood pressure etc. appear. It also causes the cognitive and behavioral problem for example depression, anxiety, memory difficulties and apathy. PD most commonly affects more than 1% of people older than 60 years of age [33] occurring alteration in gaits and posture which causes difficulty in movement and risk of falls.

Though Parkinson’s disease is getting worse day by day, no cure has been found for it till now. The treatment of this disease generally intents to improve the current state of the patient but it can’t give any permanent solution. Therefore, research in the field of Parkinson’s disease diagnosis has become crucial from the aspect of medical treatment to assess the efficacy of treatment. In this case an accurate biomarker is needed to be identified first. Several studies have been published regarding the diagnosis of PD detection where speech processing were considered. But in case of audio dataset the size of database is a major problem as it is too small (less than 60 PD cases) [2, 31, 35]. Being a complex activity involving cognitive, sensory and perceptual-motor components [32], handwriting can be considered as a promising biomarker [4] in this case as abnormal handwriting is obvious in PD. Moreover, handwriting is a daily activity of human and so it is also easy and cost-effective to detect PD from handwritten images. A wide variety of techniques have been used till now in the sector of PD detection among which Computer Aided Systems that use machine learning, optimization, fuzzy logic methods have been fruitful with higher accuracy and efficient result.

In this paper we have used an transfer learning algorithm on handwriting dataset for the detection of Parkinson’s disease as it shows better performance saveing training time and not needing a lot of data [24]. In this case, at first an image processing technique has been applied where images are converted into a two dimensional array. Each value of the array indicates the pixel value of image. Meanwhile, all the pixel values are brought within a certain range from 0 to 1 by dividing each of the value by 255. This is done to reduce the difference of the pixel values. After this pre-processing step, data augmentation has been applied so that the size of the dataset is increased and the over-fitting can be suppressed. After that, the augmented data are passed through a pretrained model of transfer learning called VGG-16 for the detection of parkinson’s disease. Throughout this study, a significant contribution has been made which are as follows-

  1. 1.

    Two datasets have been combined to resolve the size issue of dataset noticed in previous research works regarding Parkinson detection from handwriting images.

  2. 2.

    Different architectures have been applied to show the comparison with our proposed model in terms of performance.

  3. 3.

    Performance comparison has been illustrated among different optimizers.

  4. 4.

    K-fold cross validation has been applied as a deep learning approach.

The next sections of the paper have been organized accordingly: Sect. 2 contains the previous works, Sect. 3 describes the methodology, Sect. 4 shows system implementation where the experimental tools used in implementing our system has been stated, Sect. 5 shows the result and discussion and finally in Sect. 6 conclusion and future work has been discussed.

2 Literature Review

In past years, several research studies have been published regarding PD diagnosis among which machine learning techniques showed promising results [29]. Zuo et al. [39] applied fuzzy k-nearest neighbors classifier and Particle Swarm Optimization for PD diagnosis achieving 97.47%.

However, most of the previous works considered signal analysis from patient’s voice [17, 23] as PD causes voice disorder. Few of the studies were based on MRI images [20, 21]. Haller et al. [7] presented a system to aid PD detection using Magnetic Resonance Images (MRI) as MRI can detect any kind of damage in the brain. They used SVM based pattern recognition of DTI which classified PD patient with 97% accuracy. Nonetheless, collecting MRI images may be costly and for image acquisition process, a patient needs to stand still which is difficult for a PD patient. Thereafter, handwriting images are much easier and cheaper to aid the PD detection.

Pereira et al. [22] applied machine learning and computer vision techniques on their own designed handwriting dataset called “HandPD” to deal with PD recognition. Here they have used some supervised techniques such as OPF, SVM and Naive Bayes classifier (NB) to evaluate the dataset among which Naive Bayes classifier (NB) have resulted the best accuracy which is 78.9%. This accuracy is quite low and needs to be improved.

Kotsavasiloglou et al. [15] utilized a pen and tablet device to show the differences of hand-movement and the coordination of muscle between the PD patients and control subjects. The authors considered five matrices of 24 PD patients and 20 control subjects. They evaluated different classification algorithms among which they obtained the best accuracy of 88.63% and an Area Under the ROC Curve (AUC) of 93.1% for Naive Bayes. Zham et al. [36] considered ten features which include both static and dynamic data of 31 PD patients and 31 healthy subjects using NB algorithm for classification. They achieved 83.2% accuracy and 93.3% AUC. In these two papers the dataset size they considered are relatively smaller in size and also the accuracy they achieved can be improved.

Gallicchio et al. [5] used Deep Echo State Networks (DeepESNs) considering Spirals and stability movement where they acquired 89.3% accuracy. Khatamino et al. [14] have worked on PD diagnosis using AlexNet architecture of convolution neural network (CNN). Since the dataset used in this paper is smaller, they considered smaller number of layers in the implementation. The accuracy they obtained after implementation was 72.5% which is quite low. Martín et al. [6] also have used a simplified version of AlexNet but they used spectrum points as input to CNN instead of raw data. The researcher obtained a good accuracy 96.5% in this case. All these three papers used same dataset named Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet dataset [8] where 62 of the participants where PD patients and 15 were control subjects which indicates a big difference in the size of two classes. This may lead a biased result.

As in most of the research works of Parkinson’s detection based on handwriting dataset, the size of the dataset is a major problem so we have tried to overcome this combining two global dataset where the number of healthy subject and Parkinson subject is almost equal which reduce the biased problem. Moreover, the accuracy of most of the papers discussed above is quite low which has been improved in our model.

3 Methodology

In order to carry out this research, a dataset consisting of handwritten images of PD and non-PD patients are collected from two different sources. Firstly, the dataset is pre-processed to make it applicable for feeding into the model. Secondly, the dataset is augmented to increase its robustness. Thirdly, the dataset is split into a ratio of 7:3 for training and testing respectively. Afterwards, a VGG-16 model is employed for training the dataset, as it can effectively extract important features [30]. Afterwards the model is tested.

3.1 Dataset

The detection of Parkinson’s disease is quite difficult as no medical tests for Parkinson’s detection have been discovered so far. The most common symptom that is noticeable in almost every patient with Parkinson’s disease is tremor which occurs change in writing. That’s why we have considered the handwriting dataset in this research work. In the field of Parkinson disease detection, there are several global datasets used in different research works. But the size of the datasets is a great issue here as the number of the images is very poor. So in this paper, we have combined two global datasets to make a large dataset. One of the two datasets is collected from kaggle website and the other is HandPD dataset. After combining the two datasets, the total number of the images are 808 where 400 images for control subject and 408 images for Parkinson subject. We have splitted our dataset into 80% training and 20% testing.

3.2 Pre-processing

Pre-processing of image generally removes low frequency background noise, normalizes the intensification of the individual practical image, removes reflection of light to get rid of the image noise, and prepares the face image to better feature extraction. In our system, we have first resized the images into 96 \(\times \) 96 dimension. Then We have converted the image to an array of pixel value. Each pixel value of the array is converted to float and divided by 255.0 so that all the pixel values comes to a range between 0 to 1. In Fig. 1, the whole pre-processing system has been illustrated.

Fig. 1.
figure 1

Pre-processing steps

3.3 Data Augmentation

Data augmentation is a technique which is used to increase the size of dataset by creating some modified copies of data which already exists in dataset. This technique is used mainly to reduce the overfitting. In this paper, the data augmentation has helped to increase the size of the existing dataset. In Fig. 2 the data augmentation has been showed.

3.4 Transfer Learning

We have evaluated our model using transfer learning based pretrained model VGG-16 for the detection of Parkinson disease. After the pre-processing and data augmentation, the pre-trained model is loaded. Thereafter the fully connected layers are interchanged with dense layers. The past convolution layers are stabled and not replaced or removed to learn strong features by the network. The softmax function has been applied as the activation function.

$$ Softmax(x_i) =\frac{\exp (x_i)}{\sum _j \exp (x_j)} $$

The Fig. 3 shows the whole systematic representation of our model.

3.5 Performance Metrics

In this study, we have evaluated our model based on three performance metrics. These are precision, recall and F1-score. Precision measures positive class predictions number which belongs to the positive class in real. In case of two class classification problem, precision is counted dividing true positives by the sum of true positives and false positives. Recall measures positive class predictions number consist of all positive case in the dataset. In case of two class classification problem, recall is counted dividing true positives by the sum of true positives and false negatives. Precision or recall can’t evaluate a model properly alone. One can have an excellent precision with a poor recall or an excellent recall with a poor precision. F1-score create a balance between these two with a single score. So, F1-score renders a single score which conveys a balance precision and recall by a single number.

$$ Precision = \frac{\text {TP(True Positive)}}{\text {TP(True Positive)} + \text {FN(False Negative)}} $$
$$ Recall = \frac{\text {TP(True Positive)}}{\text {TP(True Positive)} + \text {FP(False Positive)}} $$
$$ F1-score = \frac{\text {2} \times \text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} $$

Here, the true positive is the result where model appropriately estimates the positive class. Similarly, a true negative is the result where model appropriately estimates the negative class. A false positive is the result where the model inappropriately predicts the positive class. A false negative is an outcome where the model inappropriately predicts the negative class.

Fig. 2.
figure 2

Data augmentation

Fig. 3.
figure 3

Schematic representation of the whole system

4 VGG-16 Model Architecture

The VGG-16 model consists of one input layer, thirteen convolutional layers followed by a fully connected layer. Firstly, two convolutional layers have 64 features in the kernel filters of size three by three. The dimensions of the input are transformed into 224 \(\times \) 224 \(\times \) 64. The corresponding output is transported along with the max-pooling layer. Secondly, two convolutional layers consists 124 features in its kernel filter of size three by three. The output from these two layers is passed down to another max-pooling layer where its dimensions are reduced to be 56 \(\times \) 56 \(\times \) 126. Thirdly, three convolutional layers comprise a kernel filter size of three by three, with a feature map of 256. The resulting output bypasses a third max-pooling layer. Fourthly, five convolutional layers with a kernel size of three by three alongside 512 kernel filers are followed by a max-pooling layer. Lastly, the two layers are fully connected as hidden layers which include a softmax function for output.

5 Result and Discussion

5.1 Classification of PD Using VGG-16

Figure 4 graphically represents the performance of accuracy and loss of the VGG-16 model. Here, the purple line depicts the training accuracy which is 90.63% while the grey line depicts the testing accuracy of 91.63%. There appears to be not much difference between the testing and training accuracy. According to the graph, both the accuracy rise in a similar pattern until a state of the plateau is reached after 17.5 epochs. In addition, the red line on the graph denotes the training loss while the blue line denotes the testing loss. Both training and testing loss decreases at the same rate with little or no difference. Thus, the VGG-16 model is well-fit when it comes to training and testing the handwritten dataset. Figure 5 represents the confusion matrix, where the diagonal columns dictate the number of correctly identified test images. For instant 119 PD test images were correctly identified out of the 120 PD test images.

Table 1. System architecture description.

5.2 K-fold Cross Validation

Cross validation is a re-sampling method which is applied to assess the machine learning models on a specific data sample. In k-fold cross validation, the dataset is divided into k subsets which are equal in size. It is generally less biased because of larger training set and not expensive. In our study we have evaluated our model using K-fold cross validation as deep learning approach on our combined dataset. We have chosen the value of k = 5 here as 5 fold cross-validation.

Table 2. Different parameters
Fig. 4.
figure 4

Model accuracy and model loss

Table 2 illustrates the result of our model where we have used the five fold cross validation also the average accuracy and the best accuracy calculated after the five fold cross-validation. Here, the average accuracy and the best accuracy are respectively 89.88% and 91.36%.

5.3 Performance Metrics

Researchers usually judge the overall performance and efficiency of machine learning model using different performance metrics [26]. In our model we have implemented performance metrics so that we can understand if our model is performing well on our dataset. In this study, the performances have been assessed based on three criterion- Recall, Precision and F1-score. In Table 3, the comparison of the performance metrics has been illustrated.

Table 3. Different parameters
Fig. 5.
figure 5

Confusion matrix of Parkinson’s detection

5.4 Comparison of VGG-16 with Other Models

Table 4 illustrates a comparison between the training accuracy and loss as well as testing accuracy and loss between transfer learning models, CNN along with CNN-LSTM model. It is observed that DenseNet, ResNet, CNN-LSTM, and CNN are underfitting models, as their testing accuracy is greater than their training accuracy. In contrast, NASNet and InceptionV3 are overfitting models as their training accuracy is greater than their testing accuracy. On the other hand, EfficientNetB7 and EfficientNetB1 are well-fitting models because there lies a slight difference of 0.05 and 0.04 between the training and testing accuracy. However, this well-fit model cannot be acknowledged as their training (0.55, 0.54) and testing accuracy (0.5, 0.5) is much lower compared to that of the VGG-16 model proposed in this research.

Table 4. Comparison of results

However, there are some limitations noticed in the other methods. Though in the field of image classification CNN has been proved to perform as best algorithm comparing with other machine learning algorithms [1, 9, 27, 38], but in CNN low level information are transmitted to high level neurons [18]. After that additional convolutions are executed to examine the presence of certain features. Moreover, the internal data regarding pose and orientation of object may be lose. In case of ResNet, it is quite deeper compared to VGG16. However, over-fitting problem occurs as ResNet go deeper (e.g. over 1000). But the size of ResNet model is less than VGG16 because of global average pooling instead of using fully-connected layers. This causes the reduction of the model size. Moreover another model named Inceptionv3, which is also known as point wise convolutions, generally consists of X1 filters. It is formed by convolutional layers with different filter sizes that help to learn complex features. For simple features VGG16 is more preferable than Inceptionv3. In DenseNet, each of the layers is connected to the other layers. This excessive connections decrease efficiency of computation and parameter-efficiency. Moreover, DenseNet occupies lot of memory, causes over-fitting, use excessive parameters which result low object recognition accuracy. NASNet has managed deeper neural networks using Reinforcement Learning (RE) [19] and evolutionary algorithms in the field of image processing and computer vision. But NASNet methods are costly for realistic applications. Search efficiency is another issue of NASNet. For achieving best computer vision results, it needs huge amount of GPU-days of searching.

6 Conclusion and Future Work

This research enables to attain a solution for early detection of Parkinson’s disease irrespective of clinical trials. Besides tremor appears to be the first symptom among patients with PD, so the focus is diverted toward studying micrographic distinctive patterns. In order to achieve this handwritten images of PD patients alongside a control group are considered while undertaking this research. The dataset is experimented upon with different transfer learning models to achieve an optimal learning model. The transfer learning model is adapted in this research because of the lack of publicly available PD handwritten datasets as well as to save training time and achieve the best performance. It was observed that the VGG-16 model generated a better performance compared to other transfer learning models.

In the future, this research aims to collect a dynamic dataset by utilizing an electronic pen-pad for samples. So that the handwriting motion and the number of frames required to complete one set of hand drawings can be evaluated. As PD debilitates the movement in the handwriting of patients. Moreover, this will make the diagnosis of PD more accurate and effective. A more efficient system can be built for PD detection using Belief Rule Based Expert Systems (BRBES) [10,11,12,13, 25, 34, 37].