Keywords

1 Introduction

Parkinson’s disease (PD) is a prevalent and progressive neurodegenerative disorder characterized by motor symptoms such as tremors, bradykinesia, rigidity, and postural instability [1]. Early and accurate diagnosis of PD is essential for timely intervention and personalized treatment strategies to mitigate disease progression and enhance patients’ quality of life.

Recent advancements in machine learning and artificial intelligence have offered new possibilities in medical diagnostics, and researchers have begun exploring their potential in aiding in the identification of early-stage PD [2,3,4]. Among the various motor symptoms associated with PD, handwriting difficulties have shown promise as a potential early indicator of the disease. Handwriting is a complex motor task involving the integration of cognitive, sensory, and motor processes, and its evaluation can provide valuable insights into neurodegenerative conditions [5, 6]. Hand tremors, handwriting difficulties, and other subtle alterations in writing patterns may occur in the early stages of PD, even before other motor symptoms become evident. Therefore, handwriting analysis presents a non-invasive and accessible approach for early PD detection.

Previous studies have attempted to distinguish PD patients from healthy subjects by handwriting. Rosenblum et al. [7] used relevant features such as the average pressure and speed of participants’ handwriting, and finally achieved a classification accuracy of 97.5%. Zham et al. [8] proposed to use a tablet computer equipped with a smart pen to obtain the dynamic features of the handwriting process, such as the pressure during writing and the inclination of the pen, and then PD was classified by these features. Drotar et al. [9] collected data by asking participants to perform a variety of different handwriting tasks, and then developed three machine learning models of K-Nearest Neighbors (KNN), Adaboost ensemble model and Support Vector Machine (SVM), and finally achieved an accuracy rate of 81.3%. These studies rely on the integration of additional biological signals, which can introduce complexities and reduce the practicality of the diagnostic process.

Recently, researchers proposed to classify PD directly from handwriting drawings. Ali et al. [10] proposed a novel cascade system named Chi2-Adaboost and experiment results on HandPD dataset showed that the cascaded system achieved classification accuracy of 76.44%. Akter et al. [11] proposed to detect Parkinson’s disease from hand-drawn meander images and spiral images of suspected persons, and applied some classification algorithms such as decision tree, Gradient Boosting (GB), KNN, random forest and HOG feature descriptor algorithm. The experimental results show that the accuracy of the GB algorithm reaches 86.67%, and the accuracy of KNN reaches 89.33%. Despite the success of previous attempts to utilize handwriting analysis for PD diagnosis, challenges have been encountered in capturing subtle and intricate variations in handwriting patterns associated with the disease.

In this paper, we propose the development of a deep learning-based handwriting drawings assessment system dedicated to early Parkinson’s disease identification. Unlike previous approaches, our system focuses exclusively on handwriting patterns and requires no additional equipment when performing the test, harnessing the power of deep learning techniques to discover subtle meaningful features indicative of PD. We leverage two publicly available datasets, HandPD and NewHandPD, which encompass diverse handwriting samples from both PD patients and healthy controls. By utilizing state-of-the-art computer vision and deep learning models, including EfficientNet-B1, ResNet-34, ResNet-101 and DenseNet-121, we aim to extract discriminative features from hand-drawn images. These features will serve as vital inputs for our deep learning models, enabling accurate and efficient prediction of PD solely based on patients’ handwriting. Furthermore, we developed a Python Web Server API based on Flask and a user-friendly Windows application, ensuring the seamless integration of our assessment system into clinical practice. This implementation facilitates PD screening tests and offers valuable support to healthcare professionals in their diagnostic and management efforts.

The rest of this article is arranged as follows. Section 2 describes relevant research in this area. Section 3 focuses on the methods used and how to set up the assessment system. Section 4 presents and analyzes the results obtained. Section 5 summarizes the work.

2 Related Work

In current clinical practice, physicians typically assess Parkinson’s disease with the Unified Parkinson’s Disease Rating Scale (UPDRS) [12]. However, since the evaluation results of the scale mainly depend on the patient’s description and the doctor’s visual observation, the diagnosis made may be affected by the doctor’s medical level and subjective consciousness, which has certain limitations. Until now, the identification of Parkinson’s disease has been a clinically challenging task. Recently, researchers have developed a strong research interest in PD detection using machine learning methods based on gait and handwriting data.

Abnormal gait is a common movement disorder in Parkinson’s disease, which usually requires physical examination by a doctor or gait recognition with the help of devices such as sensors. Recently, researchers have favored non-contact, vision-based methods to identify gait abnormalities in Parkinson’s disease. Zhang et al. [13] proposed a new spatiotemporal graph convolution based on weight matrix and introduced virtual connections to make up for the lack of Parkinson’s gait characteristics expressed by the original physical connections of the body. The final model achieved an accuracy rate of 87.1%. Sabo et al. [14] proposed to predict the clinical score of Parkinson’s disease gait from videos of dementia patients, and the macro-mean F1 score of the best model was 0.53 ± 0.03. Guo et al. [15] proposed a Sparse Adaptive Graph Convolutional Network (SA-GCN) to achieve fine-grained quantitative evaluation of skeleton sequences extracted from videos, and the evaluation results confirmed its effectiveness and reliability.

One of the most obvious and common symptoms in the early stage of PD is resting tremor, and the hand is the most obvious part of the patient’s tremor, so handwriting assessment is also the simplest and most effective method for early screening. Many studies have shown that handwriting is an effective tool for early Parkinson’s diagnosis. Aly et al. [16] performed a statistical spectral analysis of the position signals output by digitized tablets of all participants and compared the power spectra of the control and patient groups to detect the presence of tremor. Saunders‐Pullman et al. [17] demonstrated that Archimedean spiral curve analysis can b complement motor assessment in early Parkinson’s disease.

3 Methodology

In this paper, we developed a system to assess the PD handwriting drawings and the proposed framework is shown in Fig. 1. We used spiral and meander images to recognize patients, respectively. We implemented data preprocessing firstly, then the various deep learning models were tested to find the best one adopted the specific task. Finally, we built a Python Web API based on Flask and a Windows software to facilitate the clinical use of Parkinson’s handwriting assessment system.

Fig. 1.
figure 1

Overall of the proposed system framework. (a) Deep learning model training. (b) Windows application structure.

3.1 Dataset

In our experiments, the research focused on handwritten images of spiral and meander tasks. We combined HandPD [18] and NewHandPD [19], two publicly available handwriting datasets, which were mainly used for the detection of Parkinson’s disease.

HandPD is a large dataset containing samples from 92 subjects (18 healthy subjects and 74 PD subjects) performing spiral and meander tasks. It contains a total of 736 images. The NewHandPD dataset was collected by the State University of Sao Paulo, Brazil, and contains data from 66 individuals, including 35 healthy subjects and 31 unhealthy subjects. In our experiment, since 21 healthy subjects in the NewHandPD dataset had very poor handwriting image quality and ambiguous handwriting, we excluded them from the final experimental dataset. The dataset is detailed as shown in Table 1.

Table 1. Dataset details.

3.2 Separation of the Hand-Written Trace (HT)

Before extracting the features of the images, in order to accurately extract the features of the handwritten trace, reduce the influence of the exam template and paper scanning, and capture the subtle changes of the handwritten patterns of Parkinson’s patients and normal people, we developed a method to separate the handwritten trace (HT) and the exam template (ET) by using color thresholds. Specifically, we first read the image using cv2.imread function of OpenCV library, and then converted the color space from BGR to HSV (Hue-Saturation-Value) since the HSV color space has the ability to separate the color information from the intensity information. The Hue channel represents the color itself, the Saturation channel represents the purity of the color, and the Value channel represents the brightness. This makes it possible to select a range of colors for a specific color, so that a specific color can be accurately identified according to the color threshold. Finally, in order to remove the noise caused by paper scanning, we only keep the extracted color, and the rest of the pixels are set to white. Figure 2 depicts the separation results.

Fig. 2.
figure 2

HT separation results.

3.3 Data Augmentation

Since healthy samples have fewer samples than patient samples, it’s essential to augment the data for healthy samples to balance the dataset. Since the handwriting trajectories of normal people are generally stable, it is reasonable that we do not change the attributes of these images, including color and direction, and only scale and translate them to augment them. This process increases the effective size of healthy samples and helps prevent overfitting during neural network training processing.

3.4 Handwriting Assessment Based on Deep Learning Methods

In this study, we propose the utilization of ResNet-34 [20], ResNet-101, DenseNet-121 [21], and EfficientNet-B1 [22] deep learning models for the classification of Parkinson’s disease handwriting images, leveraging the advantages of deep learning in extracting deeper-level features compared to traditional CNNs. These models have been pre-trained using the extensive ImageNet dataset. In addition, we use Grad-CAM [23] to visualize the heatmap of the Parkinsonian handwriting features.

Deep Learning Models

ResNet-34 and ResNet-101

ResNet incorporates residual learning to address the vanishing or exploding gradient problem that arises when the network becomes deeper, leading to performance degradation. The key structural feature of ResNet is the introduction of shortcut connections that add input to output, but the residual structures of ResNet-34 and ResNet-101 are different, as shown in Fig. 3.

Fig. 3.
figure 3

Residual structure. (a) The residual structure of ResNet-34. (b) The residual structure of ResNet-101.

For the residual structure of ResNet-34, the main branch is composed of two layers of \(3\times 3\) convolutional layers. The connecting line on the right is the shortcut branch. The output feature matrix shape of the main branch and the shortcut must be the same, and the same output matrix can be added. In the residual structure of ResNet-101, the first layer uses a \(1\times 1\) convolutional layer to compress the channel dimension, the second layer is a \(3\times 3\) convolutional layer, and the third layer is a \(1\times 1\) convolutional layer to restore the channel dimension.

DenseNet-121

DenseNet uses dense connections to connect the feature maps of all layers to the feature maps of each subsequent layer. However, unlike other CNN architectures, the input of each layer in DenseNet comes from the output of all previous layers, allowing the flow of features from the first layer to the last layer. If we denote the data by \(x\) and the network layer by \(H\), Eq. 1 shows the output of a traditional network at layer \(l\).

$${x}_{l}={H}_{l}({x}_{l-1})$$
(1)

where \({x}_{l}\) is the feature map at layer \(l\), and it is obtained by applying the layer operation \({H}_{l}\) to the feature map \({x}_{l-1}\) from the previous layer.

Equation 2 shows the output of ResNet after adding the input from the previous layer.

$${x}_{l}={H}_{l}\left({x}_{l-1}\right)+{x}_{l-1}$$
(2)

where \({x}_{l}\) is obtained by applying the layer operation \({H}_{l}\) to the feature map \({x}_{l-1}\) from the previous layer and adding the feature map \({x}_{l-1}\) from the previous layer to it. This is known as a residual connection.

Equation 3 shows that in DenseNet, the output layer connects all previous layers as input.

$$ x_{l} = H_{l} \left( {[x_{0} , x_{1} , x_{2} , \ldots , x_{l - 1} ]} \right) $$
(3)

where \(x_{l}\) is obtained by applying the layer operation \(H_{l}\) to a concatenation of all previous feature maps \([x_{0} ,{ }x_{1} ,{ }x_{2} , \ldots ,{ }x_{l - 1} ]\).

EfficientNet

EfficientNet is developed based on the idea of adjusting the model’s depth, width, and resolution, achieving a balance between computational efficiency and performance. We use EfficientNet-B1 which has strong feature representation ability as a classifier. The basic component of EfficientNet is the MBConv module, which is borrowed from MobileNet V2. To further optimize the network structure, EfficientNet has summarized the squeeze and excitation methods from SENet [24].

The EfficientNet network structure consists of a Stem, 16 MBConv modules, Conv2D, GlobalAveragePooling2D, and fully connected layer, where the critical part is the 16 MBConv module. In the MBConv module as shown in Fig. 4, a \(1\times 1\) convolution is first used to change the channels of the input features, followed by a depthwise convolution. Then, the channel attention mechanism of SENet is introduced, and finally \(1\times 1\) convolution is used to reduce the channels of the feature maps.

Fig. 4.
figure 4

MBConv module network structure.

The activation function in EfficientNet is designed as a Swish activation function, and the swish function is a Self-Gated activation function, expressed by the Eq. 4.

$$ {\text{swish}}\left( {\text{X}} \right) = {\text{X }}\upsigma \left( {\upbeta \,{\text{X}}} \right) $$
(4)

where \(\sigma \) is the logistic function, the parameters \(\beta \) can be learned or set to fixed hyper-parameters.

Gradient-weighted Class Activation Mapping (Grad-CAM)

Grad-CAM is similar to the CAM algorithm. For a category \(c\), we firstly obtain the weights \({w}_{1}, {w}_{2},\dots {w}_{n}\) of each channel of the feature map. Unlike CAM, Grad-CAM uses the gradient of backpropagation to calculate weights. The weight calculation is shown in Eq. 5.

$$ \upalpha _{{\text{k}}}^{{\text{c}}} = \frac{1}{{\text{Z}}}\sum\nolimits_{{\text{i}}} {\sum\nolimits_{{\text{j}}} {\frac{{\partial {\text{y}}^{{\text{c}}} }}{{\partial {\text{A}}_{{{\text{ij}}}}^{{\text{k}}} }}} } $$
(5)

c represents the category, \({y}_{c}\) is the logits value corresponding to the category (that is, the value that has not yet passed SoftMax), \(A\) represents the feature map output by convolution, \(k\) represents the channel of the feature map, \(i,j\) represents the horizontal and vertical coordinates of the feature map, and \(Z\) represents the size of the feature map. This process is equivalent to finding the mean value of the gradient on the feature map, which is equivalent to a global average pooling operation.

Then the obtained weights are linearly weighted and summed to obtain a heat map. Grad-CAM adds a Relu operation to the fused heat map, and only retains the area that has a positive effect on category c. As shown in Eq. 6.

$$ {\text{L}}_{{\text{Grad - CAM}}}^{{\text{c}}} = {\text{ReLU}}\left( {\sum\nolimits_{{\text{k}}} {\upalpha _{{\text{k}}}^{{\text{c}}} {\text{A}}_{{\text{k}}} } } \right) $$
(6)

Evaluation Metrics

We measured our implemented system performance based on precision, sensitivity, F1 score and accuracy using the formulas as shown in Eq. 7 to Eq. 10.

$$Precision=\frac{TP}{TP+FP}$$
(7)
$$Sensitivity=\frac{TP}{TP+FN}$$
(8)
$$Accuracy=\frac{TP\,+\,TN}{TP\,+\,TN\,+\,FP\,+\,FN}$$
(9)
$$F1 Score=\frac{2\,*\,Precision\,*\,Sensitivity}{Precision\,+\,Sensitivity}$$
(10)

3.5 Application Development for Windows Platform

The application for the Windows platform was developed using C# language within the Visual Studio 2022 integrated development environment. The user interface was constructed using Windows Presentation Foundation (WPF) framework. The backend of the application employed the MySQL database management system to securely store and retrieve user login credentials. This ensured that access to the system was restricted to authorized users only. To handle the application’s functionality, the Flask framework, based on Python, was utilized as the backend server. Flask was responsible for reading the uploaded image, loading the pre-trained deep learning model, facilitating the analysis process, and generating responses for the identification results.

4 Experiment

4.1 Experimental Setup

We use NVIDIA GeForce RTX 2080Ti GPU with 12 GB memory, Intel(R) Core (TM) i9-10900 CPU with 2.80 GHz 64 GB RAM to build a deep learning framework using PyTorch in Windows 10 environment. We use CUDA, Cudnn, OpenCV and other required libraries to train and test the PD handwriting classification model.

We use color thresholding to extract images containing only handwritten traces, and then split 10% of the experiment dataset to create a test set. This test set will be used to evaluate the performance of the final model and will only contain the original images, ensuring that the final evaluation is performed on the original images only. Then we apply data augmentation techniques including scaling and translating) to the remaining control group images to increase its diversity and help the model generalize better. After augmentation, we performed the 80:20 split ratio to split into train and validation datasets.

Since we used pre-trained models, we changed the last linear output layer of each model to 2 to classify Parkinson’s and normal people. The batch size is set to 32, the optimizer is Adam, the learning rate is set to 0.001, and the epoch is set to 100.

4.2 Early Parkinson’s Disease Identification

To assess the performance of various deep learning models for early Parkinson’s disease identification based on handwriting drawings, we conducted separate tests on ResNet-34, ResNet-101, DenseNet-121, and EfficientNet-B1. All models were subjected to the same training conditions, including batch size, learning rate, optimizer, and epoch, ensuring a fair comparison.

The experimental results are presented in Table 2.

Table 2. Experiment results.

The results demonstrate the performance of each model for both Meander and Spiral handwriting drawing types. For the Meander drawings, the Efficientnet-B1 model achieved the highest accuracy with 97.62% for the “Patient” class, while the DenseNet-121 model showed highest accuracy with 100% for the “Patient” class. For sensitivity, the performance of the four models for the “Patient” class is not much different, but EfficientNet-B1 has the highest sensitivity for the “Healthy” class, reaching 92.31%. At the same time, the F1 score and accuracy of EfficientNet-B1 are the highest among the four models, and the accuracy reaches 96.36%.

Similarly, in the case of Spiral drawings, the ResNet-34 model exhibited the highest precision with 100% for the “Healthy” class, while obtained the poor sensitivity with 69.23% using the same model. For the “Patient” class, the ResNet-101 model achieved the highest precision with 95.35% and F1-score 96.47%. Besides, ResNet-101 achieved the highest accuracy with 94.55%.

Our comparative analysis highlights the potential of deep learning models in early Parkinson’s disease identification from handwriting drawings. ResNet-34 demonstrated promising results in both handwriting drawing types, “Meander” and “Spiral,” exhibiting high precision and sensitivity values across the healthy and patient classes. Similarly, ResNet-101, with its deeper architecture, showcased commendable performance, yielding competitive precision, sensitivity, and F1-score values. Both ResNet-34 and ResNet-101 models achieved relatively high Accuracy, suggesting their potential as reliable classifiers in Parkinson’s disease identification. DenseNet-121, known for its dense connectivity pattern, also exhibited compelling outcomes. While its precision, sensitivity, and F1-score values were commendable, it showcased slightly lower sensitivity compared to ResNet models.

EfficientNet-B1, designed for achieving excellent trade-offs between efficiency and accuracy, displayed promising results, particularly in the “Meander” drawing type. With highest F1-score, accuracy and “Patient” class precision values, this model demonstrated proficiency in distinguishing between healthy individuals and Parkinson’s disease patients. Considering the diagnostic application and the goal of this study is to develop an early handwriting assessment system for Parkinson’s that is simple and does not require additional equipment, we adopted the EfficientNet-B1 model and meander images as our early screening tools.

Handwriting features in the early stages of Parkinson’s disease mainly include hand tremors, handwriting difficulties, and other subtle changes in writing patterns. To assist doctors in making a diagnosis based on Parkinsonian features of writing patterns, we utilize Grad-CAM to visualize the regions in the input image that have the greatest impact on the model’s decision-making process. The heatmap in Fig. 5 demonstrates the important regions in each sample. It can be observed that the patient’s parkinsonian features are strongly activated, showing a highlighted.

Fig. 5.
figure 5

Heatmap of the Parkinsonian handwriting features. The first row shows the healthy group; second row shows the patient group.

We compared with other studies that also identify Parkinson’s from handwritten patterns, and we can find that our method outperforms existing methods. Table 3 shows the comparison.

Table 3. Comparison with other methods.

4.3 Windows Platform Application

The software system interface consisted of several pages, with the initial page serving as the login screen. The purpose of this page was to authenticate users and verify their access rights. This authentication process is crucial in ensuring that only authorized users are granted access to the system, thus safeguarding the server from unauthorized entry and safeguarding valuable computing resources.

Upon successful authentication, users gained entry to the assessment system. Within this system, users had the option to proceed by selecting the “Upload Image” button, which triggered the image upload functionality. Through a designated Web API and the port, the selected images were transmitted to the server for further processing. The server conducted analysis and predictions on the uploaded images, leveraging the pre-trained deep learning model. Moreover, activation maps for each layer were generated, providing insightful visual representations. Subsequently, the analysis results were sent back to the client application for display. The results returned from the server are displayed in the interface, as shown in Fig. 6.

Fig. 6.
figure 6

Diagnosis result on Windows Application.

By presenting the analysis results to medical experts, the application empowered them to conveniently diagnose and evaluate the images. The visualized information aided in making informed decisions and drawing accurate conclusions based on the provided insights.

5 Conclusion

The primary objective of this research is to develop a handwriting drawings assessment system that exhibits high precision and sensitivity in identifying early-stage PD. Through extensive experimental evaluation and performance analysis, we found that the deep learning model of EfficientNet-B1 achieved the best results on meander images, achieving 96.36% accuracy and 97.62% sensitivity, surpassing other models and current research, which demonstrates the effectiveness of our system in detecting early Parkinson’s disease.

In addition, the user-friendly application for the Windows platform based on Web API incorporates a secure login process, efficient image uploading, server-side analysis using a deep learning model, and client-side visualization of the results. This comprehensive approach enables effective and convenient image analysis and diagnosis within the medical domain.

Timely PD detection can lead to better disease management and improved patient outcomes. Our system has the potential for broader applications in the field of medical imaging and diagnosis. In the future, we plan to integrate our system to clinical practice, where it can serve as a valuable aid for healthcare practitioners in making accurate and timely diagnoses.