Keywords

1 Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disease in the world [4], affecting more than 6 million people worldwide [7]. The common clinical motor symptoms of PD include muscle stiffness, tremor, motor retardation, and gait freezing [9]. These symptoms greatly affect the quality of life of patients. Therefore, the use of wearable devices to capture the patient’s movement information to assist in diagnosing the disease has become a problem worthy of attention [31].

In recent years, many approaches have been developed to classify PD severity in clinical practice. Neuroimaging has been increasingly used as an objective method for the diagnosis of PD [19], but that’s expensive and not conducive to observing the environment outside the hospital. At present, the other clinical scales standard for PD is the Unified Parkinson’s Disease Rating Scale (UPDRS) [28], which is a qualitative assessment completed by the subjective judgment of neurologists. The UPDRS can be administered in daily clinical practice without any expensive equipment. However, the scales tend to be subjective and static. Neurologists record patient reactions during different tasks and assign ratings according to UPDRS requirements, it is time-consuming and influenced by the clinical experience of doctors. At the same time, doctors are only monitoring the current symptoms in the hospital and cannot conduct timely assessments outside the hospital or in other environments [2].

In order to develop objective criteria to facilitate timely estimation of the PD severity, utilising wearable sensors to monitor disease information inside and outside the hospital has received considerable critical attention [6, 14, 20, 21, 24, 26, 30, 34].

It is necessary to remotely monitor patients with PD and constantly check their symptoms in order to analyze their condition more effectively. The auxiliary diagnosis technology of PD based on wearable devices and machine learning can help individuals to detect the disease at an early stage, and also help doctors to monitor and evaluate patients inside and outside the hospital, so as to improve the accuracy of clinical diagnosis, patients. Additionally, it is helpful for timely and effective adjustment of treatment plans to reduce the economic burden on patients.

Machine learning (ML) is frequently used for medical disease diagnosis recently because of its implementation convenience and high accuracy [18, 23, 35]. Jin et al. [15] develop a quantitative measure of bradykinesia which can be conveniently used during clinical finger taps test in patients with PD. Four performance indices were derived from the gyrosensor sensor signal include root-mean-squared (RMS) angular velocity, RMS angular displacement, peak power and total power. The system of Patel et al. [21] used support vector machine (SVM) to distinguish PD patients from healthy controls based on accelerometer data. Five different types of features were estimated from the accelerometer data: the range of amplitude of each channel, the root mean square (rms) value of each accelerometer signal, two cross-correlation-based features, and two frequency-based features. Juberty et al. [10] explored extracting chest inclination leg agility from the shimmer device which was estimated using SVM and K nearest neighbor (KNN) for automatic UPDRS assessment. Aleksei et al. [27] differentiate healthy controls from patients with stages 1 and 2 PD by caculating time, correlation and frequency features, but they only conducted disease detection and did not conduct detailed disease severity. Guo et al. [12] collected walking data from 10 PD patients in a laboratory setting to diagnose the freezing of gait by using the freezing index. However, this research only detects a single motor symptom. Pérez-Ibarra et al. [22] collected data from 5 healthy adults and 7 patients with PD walking on a treadmill as well as on the floor under the guidance of a professional, they development a real-time adaptive unsupervised algorithms for identification of gait events and phases from a single IMU mounted at the back of the foot. Luis Sigcha et al. [29] used the inertial sensors embedded in consumer smartwatches and different ML models to detect bradykinesia in the upper extremities and evaluate its severity. Six PD subjects and seven age-matched healthy controls were equipped with a consumer smartwatch and asked to perform a set of motor exercises for at least 6 weeks. Chén et al. [5] based on smartphone sensors, extracting signals from patients performing the specified six activities at home, PD and healthy people are classified through an automated disease assessment framework. However, it does not take into account the abnormalities that arise when performing activities at home. To reduce the number of anomalies occurring in home data collection, Erb et al. [8, 13] proposed a scheme that the patient logs were completed by caregivers to track patients’ daily activities, PD symptoms, and medication intake. However, caregivers have a vague delineation of symptoms and are unable to correctly identify motor symptoms, which lead to misunderstandings and errors. Martin Ullrich et al. [32] collect data with inertial measurement units over two weeks from 12 patients with idiopathic PD who completed the series of three consecutive 4 \(\times \) 10-Meters-Walking Tests at different walking speeds besides their usual daily-living activities.

Although the previous research has extracted several features that are effective for classification, these systems primarily focus on extracting common features specific to designated activities. When target activity is altered, features that were effective on the original activity may not remain superior on the new activity. Hence, our objective is to identify the most representative features for PD severity classification.

Targeting at above-mentioned issues, this article focuses on differentiate the PD severity and select representative 20 features related to the disease. More precisely, to ensure data reliability, we firstly collect 85 PD subjects of different severity grades. Each subject performes the 8 activities within the part-III of UPDRS scale and is scored by the movement disorders neurologists. Our experimental design is conducted from four perspectives. Firstly, we explore the impact of different window sizes on data processing, we segment the dataset using sliding windows and experiment with various window sizes such as 1 s, 1.5 s and 3 s. In the second step, we focus on model selection to test the robustness of features. We validate several mainstream machine learning models, including Support Vector Machine (SVM), Logistic Regression (LR), and LightGBM (LGBM). In the third step, we employ joint model feature selection(JMFS) mechanism to select common important feature. Our objective is to identify the most important common feature set among eight different types of activities. Lastly, we determine the optimal feature dimension by comparing the performance differences among different feature dimensions, such as 10, 20, 30, and so on. This enables us to select the feature dimension that exhibits the best performance. The experimental results show that when using the feature set extracted with a window size of 300, the first 20 important features selected through feature selection are 8.22\(\%\) higher than using all features in the classification of PD severity.

The focus of this study is to assess the severity of patients with PD through single wearable sensor. Our contributions are as follows:

  • A novel technical pipeline is proposed for fine-grained classification of PD severity and identifying the most representative features.

  • The most representative 20 symptom-related features is presented in 8 UPDRS activities from gyroscope and accelerometer data.

  • We provide ablation experiments on three aspects from model, window size and feature dimensions respectively to ensure the representative and generalisation of the proposed features.

The rest of this paper is arranged as follows. Section 2 describe the methods used in this work, We discuss the results of our research in Sect. 3. Finally, Sect. 4 summarizes this paper and put forward the future prospects.

Table 1. UPDRS Paradigm Activities

2 Methods

2.1 Data Acquisition

As part of the research, data were collected at Yunnan First People’s Hospital (China). The study participants were informed about the project and signed a written consent form. The dataset consists of a total of 85 participants,18 with stage 1, 34 with stage 2, 19 with stage 3, 14 with stage 4, other informations(age: 64 ± 10, gender: 35 male, 24 female, height: 165 cm ± 10, weight: 56 kg ± 10). After negotiating and signing the data collection consent form, the experiment will start. Firstly, a inertial sensors will be worn on the patient’s right wrist, then, under the guidance of professionals, patients are required to complete a series of 8 tasks in Table 1, which are selected from the UPDRS-III scale based on the advice of neurologists [11]. Each action collected for 20 s without special instructions, the duration of the entire procedure is approximately 6 min. Each task has a specific purpose, such as evoking specific symptoms of PD. Figure 1 shows 8 normal form activities.

2.2 Data Preprocessing and Feature Extraction

The activity data is collected by the wearable sensor shimmer 3 IMU units with a sampling frequency of 204 Hz which is synchronously transmitted to the computer through Bluetooth connection, its data include three-axis accelerometer and gyroscope signal. Raw data lines were written into a text file and then converted into a CSV format, with seven data columns: timestamp, x, y, and z-axis of the accelerometer and gyroscrope data.

In order to isolate the frequencies related to the disease and maintain the authenticity of the original signal to a greater extent and reduce the interference of noise, the original data is usually filtered and processed. Through signal spectrum analysis of the signals we collected and review of relevant literatures, the tremor frequency of PD patients can be divided into three categories: resting tremor 3–6 Hz, postural tremor 4–12 Hz and motor tremor 2–7 Hz [3]. Therefore, it is recommended to use a 3–12 Hz band-pass filter to filter the patient’s motion signals. After filtering, the data of each axis are normalized by Z-score standardization [25]. After that, a sliding window will be used to segment the original time series data. The window division method is Semi-Non-Overlapping Window and the overlap rate is 50\(\%\) [1]. The window size should include at least 2–3 activity cycles. In this study, the window size will be divided according to the minimum time point and peak width of the waveform [16]. Therefore, we select the sliding window size of 200, 300 and 600 to test an optimal window size, Fig. 2 shows using sliding windows of different sizes for feature extraction on the waveform of Activity 1 signal collected by the sensor.

Fig. 1.
figure 1

Eight Representative Paradigm Activities.

Fig. 2.
figure 2

Using sliding windows of different sizes for feature extraction on the waveform of Activity 1 signal collected by the sensor.

After that,87 dimensional features are extracted from the accelerometer and gyroscope, and time domain features include: maximum, minimum, mean, variance, standard deviation, amplitude (X, Y, Z, A, T), skewness (X, Y, Z, A, T), kurtosis (X, Y, Z, A, T), autocorrelation coefficient maximum and minimum (X, Y, Z, A, T); frequency domain features include: maximum spectrum, mean (X, Y, Z, A, T), correlation coefficient (XY, XZ, XA, XT, YZ, YA, YT, ZA, ZT, AT), root mean square (X, Y, Z, A, T), energy values (X, Y, Z, A, T), Entropy (X, Y, Z, A, T), main frequency (A, T). A total of 174 dimensional features. X, Y and Z respectively represent the three axes of the three-dimensional sensor, A is the fusion axis of the three axes, T is the inclination axis, and the fusion representation of the three axes is performed by calculating the signal amplitude vector. For the fusion axis, the fusion representation of the three axes is performed by calculating the signal amplitude vector (SMV), which avoids the user’s change in a single direction, which helps to measure the overall intensity of the activity.

2.3 Representative Feature Selection

After feature extraction,significant feature selection is performed to select the most useful features for disease classification. Cause different models have different scales indexes of feature importance [17, 33], and there are multiple ranking results of importance, so we consider using Joint Model Feature Selection (JMFS) mechanism to select common important features. We use SVM-L1, SVM-L2, LR-L1, LR-L2, LGBM a total of 5 models to make joint decisions.

The LGBM model has low computational complexity and good scalability in calculating the importance of features. Due to its framework based on gradient lifting trees, the calculation of feature importance is carried out by iteratively fitting residuals and selecting the best segmentation point, without being limited by feature dimensions. This makes LGBM suitable for processing large-scale datasets and high-dimensional features. The LR model is relatively simple in calculating the importance of features. Due to its linear nature, the importance of features can be measured by observing the absolute values of model parameters. LR has low computational complexity and good scalability, making it suitable for tasks where feature importance is calculated. The SVM model is relatively complex in calculating the importance of features. The calculation of feature importance involves retraining the model and calculating support vectors, which may result in high computational complexity and limited scalability. Especially when dealing with large-scale datasets and high-dimensional features, SVM has a high requirement for calculating the importance of features.

SVM train the best hyperparameters from [0.0001,0.001, 0.01,0.1,1], LR train the best hyperparameters from [0.001,0.01,0.1,1,2], the learning rate of LGBM is 0.05, and the maximum depth of the tree is 2. The feature selection process is as follows:

  • Input the samples into the model and sort the feature weights generated after training in descending order;

  • Each time the feature weights are sorted in descending order, the 20th weight is taken as the threshold, the first 20 importance is set to 1, and the last 20 importance is set to 0. The experiment is repeated for 20 times, and the features with the most occurrence times are recorded;

  • Make statistics on the features that appear most frequently in the top 20 of the 8 data sets.

3 Experimental Results

The goal of this research was to provide the PD severity diagnosis, including three categories of mild (stages 1+2), moderate (stage 3), and severe (stage 4+5), and select representative 20 features related to the disease. At the same time, we explored the most appropriate sliding window size and the optimal feature dimension. In this scope, it was decided to use three classification approaches taking a part of this work including differentiating between.

We validated our approach on datasets collected in a laboratory environment. All our experiments were carried out on an ordinary computer with 2.6G Hz CPU and 8 GB memory. Experiment metrics including accuracy, f1-score, precision and racll. LGBM classifiers was used in final since provided the best results.

3.1 Representative Features

We use the JMFS mechanism proposed in methods 2.3 to select important features for the extracted 174 dimensional features. Cause different models have different scales indexes of feature importance, and there are multiple ranking results of importance, JMFS mechanism can identify the most important features they share, so that the selected features can ensure robustness and universality. The final selected top 20 dimensional features are shown in the Fig. 3.

Fig. 3.
figure 3

Feature importance ranking. This figure shows the top 20 most important features jointly selected by the five models.

CORR represents axis correlation, ACV represents autocorrelation coefficient variance. From the top 20 most important features selected, CORR and ACV are features worth paying close attention to, followed by the maximum and minimum values of the x and y axes that play an important role. In addition, we also find that the features of the accelerometer correspond to those of the gyroscope.

3.2 Sliding Window Size

Table 2 shows the classification accuracy of 8 activities using different window size. We found that the window contains different periods and the key features extracted vary. From the experimental results, the most suitable window size is 300. For small amplitude actions, using a window of 300 is optimal. For larger amplitude activities such as Activity 6(FN-R), using a window of 600 will have slightly higher accuracy, possibly due to the fact that the window of 600 contains more activity cycles than the window of 300. As for window of 200, the reason for the average accuracy of the results is that it contains too few activity cycles and the model does not learn the motion laws well. Additionally, Window size has little effect on static activity such as Activity 8(STANDH).

Table 2. Classification accuracy of different windowsize (\(\%\))

3.3 Models

After determining the optimal window size, we conducted a fine-grained classification of PD severity, which refers to the three classifications of mild (stages 1+2), moderate (stage 3), and severe (stage 4+5). The highest accuracy is highlighted with bold and hand fine category activities are highlighted with underline. The experimental results in Table 3 and Table 4 showed that hand fine category activities especially Activity 3 (ALTER) had the best effect on disease classification reaching 73.15\(\%\).

Table 3. Classification result of PD severity (\(\%\))
Table 4. Classification result of PD severity (\(\%\))

3.4 Feature Dimensions

After sorting key features through the JMFS mechanism, we further explored the optimal feature dimension and identified the most useful features for disease diagnosis. Table 5 shows Comparison of accuracy in selecting features from different dimensions. We determine the optimal feature dimension by comparing the performance among different feature dimensions such as 10, 20, 30, and so on. This enables us to select the feature dimension that exhibits the best performance. The experimental results show that when retaining the most prominent features in the first 20 dimensions, the classification accuracy reaches the best 81.37\(\%\). Figure 4 shows the classification accuracy of different dimensional features on three activities, in these numerous experiments, it is found that all activities had the same trend, so only a portion of the activities were shown here. And it is more evident from the figure that the best classification performance is achieved when retaining the significance features of the top 20 dimensions.

Table 5. Comparison of accuracy for different feature dimensions (\(\%\))
Fig. 4.
figure 4

Accuracy of different feature dimensions on three activities.

4 Conclusion

Accurately capturing motor symptom diagnosis of PD patients is particularly important to determining appropriate medication schedules. In this paper, we differentiate the PD severity and select representative 20 features related to the disease in 8 activities, which effectively provides more representative information than using all features. At present, the best accuracy is 20 dimensions features in ALTER, with an accuracy rate of 81.37\(\%\). This facilitate the later transplantation of lightweight equipment and provide reference for the independent PD diagnosis in the clinical or at home environment. In the future, we can test the possibility of more feature selection methods and an adaptive window sliding method which can be automatically adjusted according to the cycle of different activities themselves.