Introduction

Clinical Background

Parkinson’s disease (PD) is a common neurodegenerative pathology that currently affects millions of people worldwide with increasing rate.10 PD is caused by a critical loss of dopamine in the forebrain and is characterized by cardinal motor symptoms (i.e., tremor, postural instability, muscular rigidity, bradykinesia)11 and also by many non-motor manifestations (NMMs) (e.g., sleep disorders, bladder disturbances, olfactory symptoms), which are widely disabling for PD patients,32 hugely worsening their quality of life. Currently, PD diagnosis is primarily based on the assessment of motor symptoms according to diagnostic criteria.13 Clinical scales, such as the Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS)15 and the Hoehn & Yahr (HY),17 are typically used to assess PD patients, assigning a score according to the disease severity. These scales are semi-quantitative, and the assigned scores are subjective and affected by variability. Specifically, in the MDS-UPDRS section III motion is evaluated and patients are asked to perform detailed tasks (e.g., finger and foot tapping, gait), which are visually observed and assessed by neurologists. However, small worsening in subjects’ performance are difficult to detect, thus, quantitative assessment of motion capabilities for PD patients seems to be mandatory to obtain an objective evaluation of disease onset and progression. In this context, wearable sensors play a fundamental role thanks to their accuracy, unobtrusiveness, and ease of use, both in the hospital and for long-term monitoring,30,36 thus representing a solution for supporting clinicians’ evaluations through the measurement of motor performance.

Meanwhile, an increasing interest is emerging also toward NMMs, because they are involved in the neuropathological changes in the brain and, as such, they can anticipate the motor manifestations of the disease by 5–7 years.24 Recently, experts have recognized the need for new diagnostic criteria7 which aim to investigate and treat the pathology since its beginning, when small neurological areas are damaged and neuroprotective therapies could slow down the PD progression.31 Among the NMMs, idiopathic hyposmia (IH) is a common symptom affecting > 95% of PD patients.16 Hyposmia is a reduced olfactory sensitivity accompanied by poor perception of flavors that is observed independently from disease duration. IH in healthy adults is associated with a 10–12% increase in risk of developing PD.23,28 Since IH shows high sensitivity, but low specificity for PD development, it is not sufficient to identify people with early PD. Imaging diagnostic techniques, able to investigate cerebral pathologies, are currently used to confirm the PD diagnosis, but they require specially trained dedicated staff and are invasive, such as the single proton emission computed tomography (SPECT) DaTSCAN, expensive, such as the nuclear magnetic resonance with diffusion tensor imaging (NMR-DTI), or not specific enough, such as the Transcranial Sonography. Particularly, SPECT is a cerebral scintigraphy performed using the DaTSCAN®, a substance developed by General Electric Healthcare to reveal the presence of dopamine transmitters in the brain. Since PD is caused by critical deficiency of dopamine in the forebrain, scanners of PD patients, for instance, are able to show low level of dopamine.3 Differently, NMR-DTI is a nuclear magnetic resonance with diffusion tensor imaging that investigates the PD pathophysiology studying the white fiber integrity.5 In a recent study, NMR-DTI enabled to discover structural disruptions in certain brain networks of people with PD and cognitive impairment.12

Hypothesizing that IH can be assumed as a preclinical biomarker for PD, a two-step approach can represent a valuable solution to identify the early onset of PD22 by combining olfactory screening and non-invasive measurements of motion capabilities in IH subjects.

Related Works

In several recent studies, researchers applied machine learning techniques to classify PD patients and healthy subjects of control (HC) based on their motion capabilities in the lower limbs (Table 1). They used different types of technologies, including accelerometers (ACC), gyroscopes (GYR), electromyography (EMG), and force sensors. The majority of the studies focused on gait only, which represents a single task in MDS-UPDRS III, whereas only two works6,21 included alternative exercises, such as foot tapping. Further, three out of nine works analyzed a smaller dataset,4,18,21 which involved 15 or fewer PD patients.

Table 1 Related works.

Different machine learning approaches were implemented in these studies to distinguish between two groups (i.e., healthy subjects and PD patients) with prevalence of Support Vector Machine (SVM),2,6,18,21,33,34 Linear Discriminant Analysis (LDA),6,25,27 and Random Forest (RF).2,4,34 Among these classifiers, LDA seemed to achieve the overall worst results in terms of misclassification rate,25 sensitivity, and specificity (reported in Table 1 as “sens.” and “spec.”, respectively).6 Thus, it will not be used in the proposed work.

Actually, only in one study33 did researchers try to implement a multiclass algorithm, dividing PD patients into two groups—patients with and patients without significant gait impairments—but results for class recall and class precision were poor (i.e., on average < 70%).

To the best of our knowledge, no studies have been conducted that have tried to apply these algorithms to people at risk for developing PD (i.e., IH subjects). Currently, only imaging diagnostic techniques have demonstrated that idiopathic hyposmia is a preclinical marker of this pathology.16

This work aims to investigate differences in motor performance in the lower limbs of different classes of people (i.e., healthy subjects, people with IH, and PD patients), measured by using wearable inertial measurement units (IMU), comparing different supervised machine learning approaches. In particular the authors propose to:

  1. (i)

    analyze a comprehensive experimental protocol for a complete motor evaluation of lower limbs in PD patients using four exercises (i.e., leg agility, toe tapping, gait, and rotation) according to the tasks described in MDS-UPDRS III, while increasing the number of exercises compared to previous works;

  2. (ii)

    evaluate a wide set of extracted features related to biomechanical characteristics of the users that allow examination of spatial, temporal, and frequency parameters of the lower limb movements;

  3. (iii)

    investigate the performance of supervised machine learning approaches for motor assessment of lower limb performance in Parkinson’s disease by comparing three different classifiers (i.e., SVM, RF, and NB).

Materials and Methods

Participants

Thirty HC (25 male, 5 female, mean age ± standard deviation [SD] 65.2 ± 2.5), 30 IH subjects (21 male, 9 female, mean age ± SD 66.0 ± 3.2), and 30 PD patients (25 male, 5 female, mean age ± SD 67.9 ± 8.8) were involved in this study. The PD patients were mild to mid moderate (mean MDS-UPDRS III ± SD score: 14.7 ± 8.6; mean HY ± SD score: 1.9 ± 0.8), with prevalence of the onset of the disease almost equally distributed (9 unilateral right, 10 unilateral left, and 11 bilateral). The three groups were age-matched. Subjects with IH were recruited through the IPMP-MS Project (2013–2016, coordinated by ASL1MS Department of Neurology, Carrara Hospital). A screening was provided for IH using the IOIT olfactory test,23 and the relationship between IH and the development of PD was examined in a wide sample of subjects. For PD patients, the exclusion criteria consisted of impairments or diseases other than PD that could affect the performance of daily activities (e.g., orthopedic and neurological disorders). All patients were clinically assessed using the MDS-UPDRS III15 and HY.17 All measurements were performed in a clinically defined ON-state. All the subjects signed a written informed consent, and the study procedure was approved by the Ethical Committee of ASL1 (Azienda Sanitaria Locale, Massa e Carrara, Italy, n°1148/12.10.10) in accordance with the most recent Declaration of Helsinki.

Instrumentation

A wearable device based on an IMU was developed to objectively analyze the motor performance of the subjects involved in the study. The device, named SensFoot V2 (Fig. 1), is low-cost, low-power, non-invasive, small, lightweight, wireless, and easy to use. It is supplied by rechargeable LiPo batteries and enables the collection of data with 100 Hz sampling frequency. The device, which consists of an IMU integrated into the iNEMO-M1 board, is based on microelectromechanical sensors (MEMS) (three-axis gyroscope L3G4200D and six-axis geomagnetic module LSM303DLHC) and ARM-based 32-bit microcontroller STM32F103RE (STMicroelectronics, Italy). The system is integrated with a Bluetooth module (SPBT2632C2A, v3.0, STMicroelectronics) which wirelessly transmits acquired data to a remote personal computer for offline analysis. The device is fixed on the dorsum of the subject’s foot with an elastic band to avoid movement between the foot and sensor.

Figure 1
figure 1

SensFoot V2 wearable device.

Experimental Protocol

To assess the motor performance of the subjects, the experimental protocol was defined according to the neurologist and following the tasks described in the MDS-UPDRS III. In particular, four exercises were selected (tasks 3.7, 3.8, 3.10 of the MDS-UPDRS III), which are the only ones focused on lower limbs motion analysis: respectively, toe tapping with heel pin (TTHP), heel tapping (HEHE) and rotation (ROTA) and gait (GAIT).

First, the correct execution of each exercise was shown by the clinical staff, and every subject attended a short preliminary training to try all required movements. At the beginning of each exercise, the subject was asked to maintain a specific fixed position to allow 3 s of static acquisition, which represents a baseline for each trial. Specifically, regarding the tapping exercises (TTHP and HEHE), the subjects assumed a comfortable sitting posture, holding right angles between trunk and thigh and between thigh and shin; they stood with their arms at their sides for ROTA and GAIT. Then, the subjects were asked to perform the following exercises:

  • TTHP the subjects tapped their toe on the floor, always keeping the heel in contact with the floor, for 10 s, performing the tapping action as quickly and widely as possible (MDS-UPDRS 3.7—toe tapping).

  • HEHE the subjects tapped their heel on the floor, always keeping the forefoot raised from the floor, for 10 s, performing the tapping action as quickly and widely as possible (MDS-UPDRS 3.8—leg agility).

  • GAIT the subjects started the gait with the non-sensorized foot and walked 15 m, at the preferred speed, in a linear way until reaching the finish line (MDS-UPDRS 3.10—gait).

  • ROTA the subjects turned in clockwise/anticlockwise direction for 360°. The rotation was performed in a clockwise direction when SensFoot V2 was placed on the right foot and in an anticlockwise direction when SensFoot V2 was placed on the left foot (MDS-UPDRS 3.10—turn within gait).

Pre-processing

The inertial data acquired with SensFoot V2 were stored and processed offline via Matlab®R2016b (The MathWorks, Inc., Natick, MA, USA). Before computing parameters, data from the accelerometer and gyroscope were filtered with a fourth-order low-pass digital Butterworth filter with a 5 Hz cut-off frequency to eliminate high-frequency noise and tremor frequency bands,20 typically identified between 3.5 and 7.5 Hz.29 The choice for a cut-off frequency equal to 5-Hz represents a trade-off between the need to remove pathological tremor and to preserve significant information from the acquired signals. Indeed, a cut-off frequency less than 5 Hz could lead to delete useful information if the exercise is executed particularly fast. On the other hand, a cut-off frequency higher than 5 Hz could lead to maintain tremor noise together with the useful signal when Parkinsonian tremor occurs.

Differently, for gait analysis, the cut-off frequency was fixed to 3 Hz, since the cadence during a free walking is typically around 1–2 Hz. Custom-made algorithms were implemented to identify the characteristic times that allowed the signal segmentation for each exercise. Angular rates were integrated using the trapezoidal rule, with sub-intervals of integration equal to the inverse of the sensor-sampling rate (Δt = 10 ms), to calculate the movement amplitudes. A linear drift correction was applied step by step to avoid cumulative effects.

The parameters measured for each exercise are reported in Table 2 and detailed in Appendix Section. Each subject was consecutively examined two times, both for right foot (R_FOOT) and left foot (L_FOOT). For comparison between groups, the mean value of the repeated measures was used.

Table 2 Extracted features during experimental sessions.

Feature Selection

Three different datasets were created and analyzed from the obtained parameters. In the first case, a dataset composed only of 30 HC versus 30 PD, named dataset 2C60, was designed. In the second case, the IH were considered as healthy subjects, which resulted in a dataset composed of 60 HC (30 HC plus 30 IH) and 30 PD, named dataset 2C90. Finally, the three different groups of subjects were considered separately (i.e., 30 HC, 30 IH, and 30 PD), for dataset 3C90. These three datasets were built to analyze how the system was able to distinguish among the three groups and how the performance changed when the IH were considered as healthy subjects or not considered at all.

The Kolmogorov–Smirnov test was applied to reveal if the acquired data had a parametric or nonparametric distribution. Since each extracted parameter resulted in a nonparametric distribution, the Kruskal–Wallis test was used to investigate the statistical significance for each feature in discriminating among the three groups (i.e., dataset 3C90), while a Wilcoxon rank sum test was applied to identify significant differences between two groups (i.e., datasets 2C60 and 2C90). The significance level was set at p value < 0.05. In addition, Spearman’s correlation coefficients were separately assessed for each foot to eliminate the features that resulted in a high degree of correlation. If two or more features gave a high correlation coefficient (ρ > 0.85), only one of the features was maintained. This dimensionality reduction was performed to decrease the size of the feature array and to avoid over-fitting in the following classification step.

Supervised Classifiers

Three supervised learning classifiers (i.e., SVM, RF, and NB) were implemented to distinguish among the different groups of subjects involved in this study according to the three defined datasets (2C60, 2C90, 3C90).

Support Vector Machine

SVM maximizes predictive accuracy while automatically avoiding data over-fitting since it is based on the maximum margin hyperplane and support vectors are the closest elements to the decision surface. Beyond the success of SVM in many classification problems,2,6,18,21,33,34 this approach was chosen for its good generalization performance26 and its capability to perform well on big datasets with many attributes.19 SVM however suffers from high computational and large memory requirements, but these issues are not relevant for the proposed work because the analysis was made offline. SVM can implement different types of kernels (e.g., linear, quadratic, gaussian or RBF, polynomial) which define the function for generating the hyperplane.9 In this work, a built-in function of MATLAB was used for the SVM setting a third order polynomial kernel.

Random Forest

RF is an ensemble learning algorithm composed of multiple decision trees. The assignment of a new observation vector to a class is based on a majority vote of the different decisions provided by each tree constituting the forest. It is robust to noise and outliers because it is based on a random step in the process of creating the trees and selecting a splitting feature.8 RF was chosen in this work due to the good results demonstrated in previous works.2,4,34 Moreover this method performs well on large datasets and the parametrization is quite simple.8 Even if RFs are not computationally intensive the model can take a lot of memory,14 but since the proposed classification is supposed to work offline, these factors are not considered in this work. To implement the RF, a package for Matlab®35 based on the Breiman et al. algorithm8 was used.

Naïve Bayes

NB are probabilistic learning algorithms, based on Bayes’ Theorem. They calculate the probability of each category for a given sample and then output the category with the highest probability. This algorithm require little data for training and little storage space, while being quick in the training phase. Moreover, it does not have lot of parameters to be set. However, the strong assumption that characterizes the NB is the conditional independence of the features (i.e., all attributes are independent given the value of the class variable) and even if it shows good performance also when there are dependencies between attributes, the performance can decrease when there is a strong correlation between two or more features.1 Due to the described features and its probabilistic approach, which is different from SVM and RF, the NB classifier is used in this work, in particular a built-in function of MATLAB was used to implement the NB.

These classifiers were trained and tested to evaluate their ability to correctly recognize the health status of the subjects. Three different classifications were implemented, and relative confusion matrices were calculated for 2C60, 2C90, and 3C90.

A ten-fold cross-validation method was used, which ensured that unknown samples that were never presented to the system during the training phase were always available for testing. For each fold, the classifier was employed to train a model and assess its performance. The method was then used to calculate the average test error over all the folds.

The performance of the classifiers were evaluated in terms of sensitivity or recall (i.e., percentage of cases that are correctly identified as true, Eq. (1)), specificity (i.e., percentage of cases that are correctly identified as false, Eq. (2)), precision (i.e. percentage of cases correctly identified as true with respect to all diagnosed as true, Eq. (3)), accuracy (i.e., percentage of cases that are correctly identified over all subjects, Eq. (4)), and F-measure (i.e., a weighted average of the specificity and sensitivity, Eq. (5)). To obtain these measurements, the True Positive (TP), False Positive (FP), True Negative (TP), and False Negative (FN) values were calculated.

$$ {\text{Recall}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} $$
(1)
$$ {\text{Specificity}} = \frac{\text{TN}}{{{\text{TN}} + {\text{FP}}}} $$
(2)
$$ {\text{Precision}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} $$
(3)
$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} $$
(4)
$$ F{\text{-measure}} = 2 \times \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(5)

The methodological approach implemented in this work for comparative assessment of motor performance is summarised in the flow chart reported in Fig. 2.

Figure 2
figure 2

Flow chart showing the implemented methodology for motor assessment of different groups of people.

Results

The results obtained from inertial data analysis and the motor performance assessment using three different supervised classification (SVM, RF, NB) approaches on three different datasets (2C60, 2C90, 3C90) are reported in this section.

Feature Selection

The 23 extracted features, both for right and left sides, from lower limb tasks, are reported as mean values and standard deviations for each class in a table uploaded as Supplementary Data online. The p-values calculated with significance tests, considering the three different datasets, are also reported, and statistical differences are marked with *.

For dataset 2C90, 13 parameters per limb resulted in significant differences between the two classes. In addition, in dataset 3C90, the L_FOOT RO_RS could detect differences among the three groups, and in dataset 2C60, an additional significant feature was L_FOOT RO_Freq, which resulted in 15 parameters per limb.

Since some parameters are highly correlated with other parameters, and redundancy of information can decrease the accuracy of group detection, according to the Spearman’s correlation coefficients, the final datasets were respectively reduced to 12 features per limb for 2C60, 10 features per limb for 2C90, and 11 features per limb for 3C90 by deleting RO_STT, TT_Freq, and HE_Power.

In summary, the features used to identify differences among all the datasets were GT_Time, GT_Strd, GT_Exc, RO_Time, RO_Strd, TT_Taps, TT_IAV, HE_Freq, HE_Peak, and HE_IAV. Additionally, RO_RS contributed in 3C90 and 2C60, while RO_Freq was included only for 2C60.

Classification Results

The classification results, reported in Table 3, consider the right foot and left foot separately as well as both feet together.

Table 3 Comparative results from SVM, RF, and NB classifiers in terms of Precision, Recall, Specificity, Accuracy, and F-measure for the three datasets (3C90, 2C90, 2C60) calculated on right foot, left foot, and both feet.

Generally, results derived from both feet are slightly better than those obtained using a single limb, but there are also some cases in which a single foot worked better than two feet. Nonetheless, the changes obtained by using a single limb or both are not significant, and they could be caused by the specific datasets.

Concerning the use of features coming from both feet, among the three classifiers, the best results were achieved with RF and NB when considering the 2C60 dataset (accuracy and F-measure both equal to 0.97 for RF and to 0.95 for NB), while RF gave better performances when the other two datasets were given as input to the machine learning algorithms (accuracy and F-measure both equal to 0.77 for 3C90 and to 0.98 and 0.97 for 2C90, respectively). In all cases, SVM generally appeared as the worst classifier to be used for this purpose, considering the acquired data (Fig. 3).

Figure 3
figure 3

Comparative classification results for SVM, RF, and NB classifiers considering both feet.

In Table 4, the results from the RF classifier for the 3C90 dataset are reported in detail for each class and for both feet. The results show that PD patients are well identified from the other two classes, whereas the worst results are seen for IH subjects that could be misclassified, as indicated by the confusion matrix in Fig. 4.

Table 4 Results for 3C90 dataset, applying RF classifier on both feet in terms of precision/class, recall/class, F-measure/class.
Figure 4
figure 4

Confusion matrix from RF classifier for 3C90 dataset for FEET. Correct predictions are reported in green, while incorrect predictions are reported in red.

Discussion

This work aimed to investigate how the acquisitions made with inertial sensors, worn on the feet during the performance of MDS-UPDRS III scale tasks, allowed the extraction of significant motor parameters to identify differences between three groups of people (i.e., healthy subjects, people with idiopathic hyposmia, and PD patients) with good accuracy. The motion data were acquired via SensFoot V2 inertial sensor while subjects executed an experimental protocol composed of four motor tasks from the MDS-UPDRS III scale, which is typically used to evaluate PD patients during neurological exams. Acquired data were transmitted to a PC where they were filtered and processed through appropriate algorithms to calculate 23 features per limb that could represent subjects’ motor performances. A data analysis based on significance and correlation tests was implemented to obtain an adequate feature array to use for the final classification step. In this study, multiple comparisons were conducted, including the use of three different supervised machine learning approaches (i.e., SVM, RF, NB) to be applied on three datasets (i.e., 2C60, 2C90, 3C90), for two-group or three-group classification, based on data from a single limb or both (i.e., R_FOOT, L_FOOT, FEET).

The study results reveal that the unobtrusive system used in this work enables us to distinguish between PD patients and HC with high accuracy and precision within a two-group classification. Moreover, beyond the state of the art, acquiring motor data from IH subjects as well, a further two-group classification considering IH as healthy people can be achieved with good results. Ultimately, a preliminary approach for three-group classification is proposed. In this case, IH subjects can be clustered into an additional specific class having intermediate motor performance between HC and PD patients. Thus, IH subjects can be identifiable as people at risk of developing the pathology in a prodromal phase. Further, this issue can be achieved by using non-invasive inexpensive tools (i.e., a two-step approach that includes an olfactory test and motor assessment with inertial sensors) instead of traditional imaging techniques.

The results show that the chosen protocol (i.e., toe tapping, leg agility, gait, and rotation) allows extraction of important parameters to identify the motor characteristics for each group. The set of significant and non-correlated features, selected according to the statistical analysis, was composed of parameters computed from all four exercises; thus, each of the four tasks is important to identify the motor performance of the individuals. This finding will encourage researchers to use a comprehensive protocol for motor performance assessment which includes tapping exercises, in contrast with most works that analyzed gait only.2,4,18,25,27,33,34

Using the selected feature array, the system was able to distinguish with high accuracy and precision HC vs. PD patients using all the applied classifiers, both considering a single foot or two feet. Values obtained by analyzing the feet together were generally slightly better than those calculated for a single limb, although, for a few cases, a single foot seemed to work better than two feet. However, the differences were minimal and not significant. Further, from a clinical point of view, since the PD onset is typically asymmetrical,11 the analysis of a single body side is not justifiable because it does not permit an exhaustive analysis of motor performance, which is particularly crucial in the first stage of the pathology. Thus, the analysis of both feet is recommended to identify the disease even in the first stages, when the motor symptoms could affect only one side.

Concerning the use of both feet, in this work, the best results for two-class classification were achieved by excluding the IH from the dataset (2C60) with RF and NB (0.97 recall and F-measure for RF, and 0.93 recall and 0.95 F-measure for NB). These findings agree with other previous works in literature, which did not discriminate between the two limbs. RF seems to be the best classifier in our work, with 0.97 accuracy, as found also by Arora et al. (0.98 accuracy)4 and by Wahid et al. (0.93 accuracy).34 Even if SVM appears as the less suitable classifier in this study, the values obtained by applying this algorithm (0.93 sensitivity, 0.97 specificity, 0.95 accuracy) are comparable to or better than those found in literature by Tien et al. (0.94 sensitivity, 0.96 specificity),33 Kugler and Jaremenko (0.90 sensitivity, 0.90 specificity),21 Wahid et al. (0.89 accuracy),34 Alam et al. (0.96 accuracy),2 and Khorasani and Daliri (0.90 accuracy).18 Moreover, even though NB is not frequently applied in literature in similar studies, the results obtained in this work are promising (0.95 accuracy) and significantly better than those achieved by Wahid et al. (0.84 accuracy),34 who also employed this probabilistic approach.

Thus, the overall results of this work show good discrimination among HC and PD groups based on the analysis of lower limb motor performances acquired by wearable inertial sensors. The developed system could be applied for quantitative assessment of motion to support neurologists for objective PD diagnosis during clinical examinations.

In contrast with previous works, in this study a group of IH subjects, which are people at risk of developing PD, was included. Since they can be reasonably considered as healthy subjects, because the pathology is latent in them and will be developed only in 10–12% of IH within 5 years,23,28 the dataset 2C90 was defined and assessed, evaluating 60 subjects as healthy vs. the 30 diagnosed PD. Considering the RF classifier and the use of both feet for the assessment, the results were high in terms of accuracy (0.98), sensitivity (0.93), and particularly for specificity (1.00).

Ultimately, in dataset 3C90, where each group was assigned to a different class, the classifier performances decreased. However, this worsening was primarily related to a difficulty in discriminating between HC and IH, as shown in Fig. 4, where some IH can be confused with HC, and some healthy individuals can be misclassified as having idiopathic hyposmia. Importantly, the average results achieved in our work with both feet (0.77 recall, 0.77 precision) are better than those found by the only similar work in literature that analyzed three classes (0.70 recall, 0.73 precision),33 even if the considered groups are different. Tien et al., in fact, considered PD with and without gait disturbances and HC. In our study, PD patients are well identified (0.93 class recall, 1.00 class precision) also within 3C90. Thus, the system is able to recognize the motor impairments related to the pathology. The main issue is the limited ability to discriminate between HC and IH, although this is clinically justifiable because an IH subject is an healthy with a deficiency in smell sense, and only a reduced part of the IH will actually develop the disease. On the other hand, for other IH subjects, hyposmia might not be related to PD, and worsening in motor performance might not be revealed. Nevertheless, in the three-group classification, IH subjects are mainly identifiable as a separate group. Thus, the system seems promising in assessing small motor capability variations as well.

However, some limitations are disclosed in this study. First, the correlation with clinical scales was not considered in this work. Subjects were divided in two or three groups without considering, for instance, the severity of Parkinson’s Disease or the level of idiopathic hyposmia. Thus, it was not possible to analyze whether the PD patients that were incorrectly classified as IH subjects were the ones with a low severity of pathology. Furthermore, it was not determined whether IH individuals that were incorrectly classified as HC actually had a slight olfactory impairment. Moreover, although the number of considered subjects is comparable or higher with respect to previous works,2,4,6,18,21,25,33,34 it is not exhaustive for the assumption that the feature array selected after the significance and correlation tests could be the definitive one to identify differences from people’s motor performance. To validate the set of extracted parameters, it appears mandatory to increase the number of subjects to be considered for motor assessment. However, despite the improvements that can be applied in future works, the results obtained from the classification system are high in terms of accuracy within the scope of this paper. Thus, the presented system could reasonably support the neurologist in objective evaluation of subjects at the beginning of the disease, as well, when the typical symptoms are not yet evident but the neurodegenerative process has already begun and the motor performance is just beginning to worsen.

The analysis implemented in this work on motor data acquired by a wearable inertial device for lower limb motion assessment allowed the selection of a feature array of parameters that can identify differences between different groups of people. The comparative classification using three different supervised approaches suggests that Random Forest could be the best classifier to differentiate among healthy controls, hyposmic subjects, and PD patients, both in two-group and three-group classification. The results obtained for a two-class analysis are high (0.97 accuracy), comparable to, or better than, other works in literature, while, to the best of our knowledge, this is the first work in which motor assessment of IH subjects, as people at risk for developing PD, is evaluated. The system can recognize IH as a separate group in a three-class classification, even if some difficulties are experienced in distinguishing between IH and HC. Furthermore, this result is clinically justifiable because only a reduced subset of IH subjects will be affected by PD in the following years. Finally, the proposed system could support the neurologist for a quantitative assessment of patients’ motor performances, identifying slight worsening in motion capabilities for objective diagnosis of PD.

In future works, some improvements can be pursued to remove the present limitations. First, the dataset could be enlarged to confirm if the feature array selected in this work is truly adequate to discriminate among the groups. Then, upper limb motor performance could be analyzed as well, because the PD onset and development involve the entire body; thus, crucial additional information could be obtained. Further, clinical information (e.g., PD stage) can be correlated to biomechanical outcomes to improve the assessment of people involved in the study. Finally, a follow-up analysis of the IH subjects could be conducted, particularly for those that were misclassified, to examine if they will develop the disease or if their olfactory impairment is unrelated to PD.