1 Introduction

Around 30,000 people are diagnosed with Amyotrophic Lateral Sclerosis (ALS), same with Huntington’s disease (HD) and 1,000,000 with Parkinson disease (PD) each year in the United States [1]. The prevalence rate of the spectrum of neurological disorders in India has a mean of 2394 per 100,000 populations, providing a rough estimate of over 30 million people with neurological disorders (excluding neuro-infections and traumatic injuries) [2]. A progressive Neurodegenerative disorder like ALS affects the nerve cells in the brain and the spinal cord. The ability of the brain to initiate and control the muscle movements is lost with the degeneration of motor neurons [3]. The primary effect of Parkinson’s disease is on dopaminergic (the dopamine-producing neurons) in the substantia nigra part of the brain. Bradykinesia, gait and balance problems, rigidity of limbs, tremor are some of the symptoms of Parkinson’s disease [4]. Identification of factors that contribute to mobility and gait impairments due to neurological impairments such as Parkinson disease has been done [5] which shows balance is the most relevant factor for the same. Thus exercise interventions focusing on balance may be best able to impact gait and mobility in Parkinson’s disease. HD is caused due to the expansion of CAG trinucleotide in Huntingtin’s gene, which causes polyglutamine repeat in the huntingtin protein. It is a protein misfolding movement disorder in basal ganglia which includes chorea, tremor, motor restlessness and myoclonus thereby causing gait impairments [6]. The symptoms of PD, ALS, and HD, collectively called as neurodegenerative diseases (NDD), are not specific with easily overlooked nature. It creates a significant overlap to each other among NDDs; leading to a misdiagnosis. As per the report of The Michael J. Fox Foundation for Parkinson’s research, up to 25% of Parkinson’s disease diagnoses are incorrect. Such misdiagnosis put those patients on wrong drugs and delays the correct treatment. Neuroimaging tests, Computer Tomography or Magnetic Resonance Imaging, sometimes along with blood and urine tests, are the state of the art for NDDs diagnosis at present. However, a recent study [7] concluded that conventional Magnetic Resonance Imaging (MRI) was not found to be a reliable diagnostic tool for ALS with a sensitivity and specificity of 48% and 76%, respectively. In addition, MRI and CT are expensive, time-consuming, and require specific skills. Hence, there is a need of alternate diagnosis method which is quick, low-cost, and can be easily operated without specific skills. With this motivation, in the present approach we utilized the gait variable measurements to explore the feasibility of NDD diagnosis from walking pattern of the individuals.

Previous works

The movement disorders due to NDDs decline the ability of a person to walk properly and lead to a disturbed gait cycle. The analysis of gait parameters affected due to such diseases has applications in explaining neural component of locomotion and developing an automated noninvasive classification methodology. Among all gait parameters, the spatiotemporal variables of gait cycles utilize the simplified instrumentation and hence suitable for real time low resource settings [8]. The effectiveness of backward walking on spatio-temporal gait variables has been reviewed [9] where it is concluded that backward training could improve participants spatio-temporal gait characteristics and is potentially useful in neurological rehabilitation. Utilization of few foot switches provides information of stance, swing, and double support intervals from both lower limbs [10, 11]. Due to binary nature of switches, this information is directly available, with minimal computation, in the form of time series for processing and does not require any extensive pre-processing method before analysis. A sensor network is also proposed that allows to capture knee-ankle data in children while they walk for the purpose of gender classification [12]. This makes it attractive for utilizing in neurodegenerative disease classification also. Various features and classification methods have been reported earlier which utilized the stance, swing, and double support interval time series to classify neurodegenerative diseases. Time-domain characterization of these gait intervals followed by pattern recognition techniques has shown respectable accuracy [13,14,15]. Introducing a signal turn count (STC) feature along with other time domain feature improved the classification accuracy to 90.32% [16]. STC being a deemed parameter of frequency [17] could have introduced frequency representation of gait intervals and thus provided remarkable classification accuracy. Motivated by this, our recent work [18] utilized wavelet transformation based time-frequency representation of gait interval and achieved similar accuracy by using less input information i.e. only one gait interval time series. Improvement in the accuracy up to 100% was observed after pooling the wavelet features from all the gait interval time series. Similar results are reported from other researchers where wavelet transform based coherence and entropy have been proved useful for the classification of the control and the NDD patients [19]. Various other feature extraction methods such as maximum signal-to-noise ratio (MSNR), maximum signal-to-noise ratio combined with minimum correlation (MSNR & MC), maximum prediction power combined with minimum correlation (MPP & MC) and principal component analysis (PCA) provided a remarkable classification accuracy for the classification of patients with neurological disorders against controls [20]. Also approaches like deterministic learning theory, empirical mode decomposition, phase synchronization and conditional entropy have been shown of great potential in the categorization of controls and NDD patients [14, 21, 22].

RBC (Rule Based Classifier) has advantage of being interpretable and are “white-box” model in contrast to other available classifiers. While interpretability has been confusing and underspecified in many ways earlier [23], in our present work, we refer interpretability in the context that they are basically the “white boxes” in the sense that the acquired knowledge can be expressed in a readable form like if-else compared to just some matrices or mathematical representation, while other classifiers like KNN, SVM, Neural networks are generally “black boxes” that is we cannot read the acquired knowledge in a comprehensible way. The importance of interpretability lies in ability of users to understand the model. Having an interpretable model like Decision Tree reveals new hidden pattern and serves as a positive feedback to the user/expert. For example, if we develop an interpretable model to predict the severity of disease and if the user/expert is a clinician he can bring the expert knowledge domain to correlate the particular feature, critical for classification as adjudged by Decision Tree, to clinical symptoms based on individual patient’s history and condition. In summary, interpretability facilitates more generalized model to be handled by user/expert. Even though a good predictor would certainly be useful in practice, making a model that reveals the reasons why the outcome was wrong in specific cases would be much more meaningful and would enable the experts to design better model in the future. Also the rule based classifier needs low computational resources for implementation in hardware and the gait variable measurement used in the present study are easily and quickly measurable, therefore, one of the application of present work would be to develop a portable system to be used as wearable system for patient to observe any abnormality in gait pattern at initial stages and timely consult a physician.

We selected decision tree (DT) as a RBC representation for our work. The decision trees are easy to use, free of ambiguity and robust even in the presence of missing values [24, 25]. The decision tree was trained by three types of features as following – 1) Autocorrelation based features 2) Data Driven Features and 3) Correlation between time series. Autocorrelation in time provides an explicit estimation of frequency and hence indicate some information of frequency content in the signal without actual transformation in frequency domain [26] while saving the computation resources. Data driven features are the human observations that brings qualitative approaches combined with quantitative approach. Human observer can highlight the essential, clinically meaningful parts, thereby providing the quantitative approaches with a more relevant subset of the available data. Therefore, data driven features are important as they use the visual information captured by the expert and have been shown useful in representing various bio signals. For example, feature extraction has been performed using data driven methods from night sleep PSG (Polysomnography) recordings for sleep/wake stage classification [27]. Finally, correlation feature between gait interval time series is used to introduce the bilateral coordination during walking. It has been shown recently [22] that considering the coordinated locomotor pattern between both legs showed impressive NDD classification accuracy. We first generate large number of features using all these three types of features, then do feature selection using mutual information (MI) and finally train a decision tree classifier. Finally, we validated the present approach for a challenging situation of classification of less severe patients for a more realistic and meaningful clinical applications.

2 Materials and methods

2.1 Gait database description

The gait database used is freely available on the web page of PhysionetFootnote 1 [28]. The record contains the gait parameter intervals that are taken in the real time for control (n = 16; 2 males and 14 females) and NDD (Parkinson’s disease - n = 15; 10 males and 5 females, Huntington’s disease - n = 19; 6 males and 13 females, Amyotrophic lateral sclerosis - n = 13; 10 males and 3 females) patients. This database reports time interval of gait parameters (stance, swing, double support, and stride) from both legs. In the experiment [29], each subject was requested to walk at his or her normal pace along a straight hallway of 77 m in length for 5 min without stopping (unless he or she had to turn at the end of the hallway) on level ground. Force signals from ultrathin force sensitive switches inside each subject’s shoes were recorded with a sampling frequency of 300 Hz. These force signals were used to determine stance, swing, stride, and double support phase interval. The database also quantifies the severity of NDD in the respective category. A Hohn and Yahr score is provided which gives the severity of the Parkinson’s disease and varies from 1.5 to 4. A total functional capacity measure for Huntington’s disease is also provided which varies from 1 to 12. For the patients suffering from Amyotrophic Lateral Sclerosis this database gives the severity since the diagnosis of the disease. As the dataset is imbalanced the present study utilizes random under sampling to balance class distribution by randomly eliminating majority class examples. Two scenarios were considered while balancing the dataset – 1) Control vs. Parkinson, Control vs. HD, Control vs. ALS and 2) Control vs. NDD. In scenario#1, the minimum of 13 subjects in each category were available to balance the dataset i.e. uniform distribution. If we choose n = 13 we will have equal number of observations from each category for scenario #1. Therefore a new dataset was derived with n = 13 from each category. Now in this new dataset for scenario #2 i.e. Control vs. NDD 13 subjects in control are available and to match the equal number of patients in NDD group 13 patients were needed from NDD. However, NDD consist of three classes and it is not possible to take equal observations from all three classes and make it 13 – the number 13 is not completely divisible by 3. Therefore 12 was the preferred choice as taking n = 12 for NDD makes equal observations (4 patients) from each three category namely Parkinson’s disease, Huntington’s disease, and ALS disease thus making the dataset balanced for Control vs. NDD classification. Table 1 shown below represent the summary of demographics of various groups.

Table 1 Summary of Demographics and severity measures of different groups

To validate our proposed approach we used the dataset of less severe patients (n = 5) in each category (PD, HD and ALS). Classifying less severe patients who are in the early stages of the disease would be more challenging and will have a rich clinical applications with assisting the clinicians for better diagnosis. Table 2 shown below represent the summary of demographics of various groups belonging to less severe patients.

2.2 Processing of the data

In order to neglect the startup effects we removed the data of first 20 s. As described in the Physionet database that the significantly different strides were detected when the patients have to turn around the end of hallway space of walking. These strides were considered as outliers and were identified as the data point with the value three standard deviations greater or less than the median value [16]. These outliers were replaced with the median value of the corresponding time series because simply just excluding the outliers from the analysis would firstly shrink the data points of the time series and secondly decrease the variance in the data and cause a bias based on under or overestimation [30]. The median value is a measure of central tendency and offers the advantage of being very insensitive to the presence of outliers [31]. That is why the outliers which are with the value three standard deviations greater or less than the median value are replaced with the median value of the corresponding time series. Figure 1 shows the seven time intervals for a representative sample from each group.

Fig. 1
figure 1

Time series plots for one representative sample from each class. [a - Control, b – Parkinson’s Disease, c - Huntington’s Disease, d –ALS]. The black box highlights the standard deviation of subsequences from 30 to 60 (x-axis) for the ‘Control’ group which is quite low as compared to other groups. This difference becomes more prominent in the left and right swing interval time series

2.3 Feature extraction

The gait signals used in the present study consists of seven gait intervals in form of time series given in the database which are left and right stride interval, left and right swing interval, left and right stance interval and double support interval. From the data set, highlighted in the box, please see Fig. 1, it can be easily visualized that the standard deviation along certain dimensions in controls is remarkably low from NDD. Thus, while generating features it was made sure that all such features would be included in the analysis. Overall, 7546 features were extracted from each of the samples to aid in classification, which is mentioned below:

Auto-correlation based features

As mentioned earlier the autocorrelation based features provide an explicit indication of frequency contents in the signal. Following autocorrelation analysis was performed and further features were selected as below:

  1. a)

    Pearson correlation at different lag values between the elements of the time series was calculated. The lag considered was from 0 to 100. Here, the length of time series is 106. Thus, this gives us 101 features for each dimension. Formally, auto-correlation is defined as:

$$ \rho (h)=\sum \limits_{i=1}^{N-h}\left({y}_i-\overline{y}\right)\times \left({y}_{I+h}-{\overline{y}}_h\right)/\sqrt{\sum \limits_{i=1}^{N-h}{\left({y}_i-\overline{y}\right)}^2\times \sum \limits_{j=h}^N{y}_j-{\overline{y}}_h\Big){}^2} $$
(1)

Where:

  • yiis the value of the time series at time ‘i

  • h is the lag

  • N is the total number of time stamps in time series

$$ \overline{y}=\kern0.5em \sum \limits_{i=1}^{N-h}{y}_i/\left(N-h\right) $$
(2)
  • \( \overline{y} \)is the mean of time series from 1 to N-h

$$ {\overline{y}}_h=\sum \limits_{i=h}^N{y}_i/\left(N-h\right) $$
(3)
  • \( {\overline{y}}_h \)is the mean of time series from h to N

  1. b)

    Each of this time series was first differenced to obtain another time series of length 105. Then, similarly as in Eq. 1, Pearson correlations at different lag values were found. This gave additional 101 features.

Hence, from autocorrelation based features we get 1414 features. This can be explained as:

$$ \left[{7}^{\ast }\ {101}^{\ast }\ 2=1414\right] $$

Where 7 is the number of times series for each subject

  • 101 are the different time lags considered for different time series.

  • 2 is the type of time series that is first is the original time series and second is the differentiated version of the original time series.

Data driven features

Data driven features were observed heuristically in the time series as an observer. As mentioned earlier, visual observation suggested the remarkable difference between controls and NDD at different time windows. Following features were calculated accordingly:

  1. a)

    Mean and Standard Deviation of the time series was calculated and added to the features list.

  2. b)

    A window size was determined say ‘w’ and then moving average and moving standard deviation were calculated at each point i.e. average and standard deviation of all sets of continuous ‘w’ points were added to the feature sets, shown in Fig. 2. This gives 2*(107 – w) features for each time series and window size ‘w’ in the experiment. The window sizes used were 5, 10, 20, 30, and 40 timestamps.

Fig. 2
figure 2

Statistical features were extracted from moving windows of different sizes. Mean and Standard deviation for the data points inside the window frames were calculated and added to the features pool

Hence, from data driven based features we get 6020 features. This can be explained as:

$$ \left[{7}^{\ast }{5}^{\ast }2=70\right] $$

Where 7 is the number of times series for each subject

  • 2 is the type of time series that is first is the original time series and second is the differentiated version of the original time series.

  • And 5 (7 *2 = 14 features representing mean and standard deviation along complete separate time series. The same features were added 5 times corresponding to each window.)

As Windows = [5, 10, 20, 30, 40].

Suppose ‘w’ is 5, then first window would be from 1 to 5, second from 2 to 6, third from 3 to 7 and finally from 102 to 106. Thus, in total we have 106–4 windows. This can be written as:

106 − 4 = 106 − (5 − 1) = 107 – 5

Thus, (107-w) features were calculated for each time series.

Therefore, Sum (7*(107 - w)*2) = 6020 # sum over w, where w is the window size.

Inter-dependence between time series

This feature finds inter and intra limb coordination in and among various gait intervals. Following features were extracted under this category:

  1. a)

    Correlation between each of the seven dimensions was determined and added to the features list. This gave 21 additional features. Formally, correlation is defined as:

$$ cor\left(X,Y\right)=\sum \limits_{i=1}^N\left({y}_i-\overline{y}\right)\times \left({x}_i-\overline{x}\right)/\sqrt{\sum \limits_{i=1}^N{\left({y}_i-\overline{y}\right)}^2\times \sum \limits_{j=1}^N{\left({x}_j-\overline{x}\right)}^2} $$
(4)

Where:

  • Nis the total number of time stamps in time series.

  • yi, xiIs the value of the time series X and Y at time ‘i’.

$$ \overline{y}=\kern0.5em \sum \limits_{i=1}^N{y}_i/(N) $$
(5)
  • \( \overline{y} \) is the mean of time series Y

$$ \overline{x}=\kern0.5em \sum \limits_{i=1}^N{x}_i/(N) $$
(6)
  • \( \overline{x} \) is the mean of the time series X

  1. b)

    Each of the time series was first differenced to obtain another set of seven time series. Then, the correlation between each of the time series was used as a feature. This also added another 21 features.

This gives additional 42 features which can be explained as:

$$ \left[21+21=42\right] $$

Here 21 is the correlation between the two type of time series that is first is the original time series and second is the differentiated version of the original time series.

Hence, total features were:

$$ \mathrm{Sum}\ \left(\left[42,1414,6020,70\right]\right)=7546 $$

2.4 Feature selection

Mutual information (MI) between each of the features and the class label was determined. Then, those features were retained which had higher MI. High MI value depicts less randomness between the values of the two sets. Low MI shows that the values of the two sets are mostly independent. Hence one variable cannot be used to predict the other variable, if MI value is low.

Formally, MI between two random variables is defined as:

$$ MI\left(X;Y\right)=\iint p\left(x,y\right)\times \log \left(\frac{p\left(x,y\right)}{p(x)p(y)}\right) dxdy $$
(7)

Where p(x, y) is the joint probability density function of X and Y, and p(x) and p(y) are the marginal probability density functions of X and Y respectively. In the present study X is a feature and Y is associated label class. In case of discrete valued random variable integral is replaced by summation and probability density function is replaced by probability mass function in Eq. 7. Here, the label class is a discrete valued random variable. High MI value indicates high chances of predicting that class correctly using the features, hence high MI value features was used for further analysis.

The shortlisted features were used for constructing classification tree. Features with low MI score are not used for classification as it may lead to construction of poor trees. These features may get included in the decision rules at the lower branches and thus lead to construction of poor rules. Thus, only top 500 features were used for constructing classification tree. Figure 3 shown below represents the flowchart of the methodology employed in the present approach.

Fig. 3
figure 3

Flow chart representing features, their selection methodology and classification

2.5 Classification and evaluation

Decision Tree Classifier, as implemented with the name ‘Decision Tree Classifier’ in the scikit-learn module of python (version 0.19.0) [32], was used for classification. The aforementioned implementation was used with default values for training the decision tree classifier. Still, different runs of the same algorithm may produce slightly different results due to randomness inherent in the algorithm. The algorithm randomly selects a feature from the pool of all features, without taking into consideration any specific order, and then selects the best split point in that feature. This way, it goes through all the list of features. So, if two split points are equally good, then the order in which they are found becomes important. Hence, this brings slight randomness in the algorithm. Decision Tree Classifier is chosen because the classifier needs to be trained on 500 features and the training samples are very few. This classifier itself does feature selection to find the best splits. Also, the decision tree provides interpretability (set of rules) to the overall classification system. In contrast, SVM and ANN classifiers would simply over fit on such a training data set where number of features far exceed the number of available training samples. Also, they do not provide any set of rules for interpretability like the Decision Tree Classifier. K nearest neighbor classifier would also become inefficient due to the curse of dimensionality [33]. Leave One Out cross validation scheme was used for evaluating the classifier. In each iteration of the validation scheme, the following steps were done. One sample was removed from the complete data set for testing. The rest of the samples were used for feature selection and training the classifier. Then, the prediction of the classifier on the test sample was noted. The performance of the classifier was evaluated using sensitivity, specificity and accuracy and was calculated as follows:

$$ \mathrm{Sensitivity}=\frac{\mathrm{TP}}{\left(\mathrm{TP}+\mathrm{FN}\right)}\times 100 $$
(8)
$$ \mathrm{Specificity}=\frac{\mathrm{TN}}{\left(\mathrm{TN}+\mathrm{FP}\right)}\times 100 $$
(9)
$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\left(\mathrm{TP}+\mathrm{FN}+\mathrm{TN}+\mathrm{FP}\right)}\times 100 $$
(10)

Where TP is true positive, TN is true negative, FP is false positive and FN is false negative value for the evaluation of classification performance. A confusion matrix was also constructed for further evaluation of classifier.

3 Results

A one-way analysis of variance (ANOVA) test was conducted to observe any differences in demographics across groups in Table 1. The results show that there was a significant difference across the groups in all demographic variables, except weight, as shown with the corresponding p value in Table 3 and 3.

Post hoc analysis showed a significant difference (α = 0.05) across the groups with respect to age, height and gait speed. The findings are shown in Table 4. As shown in Table 4, control group was significantly different from ALS in gait speed and from Parkinson in age and gait speed.

The hyper-parameter 500 for feature selection was chosen by cross-validation on 4 values (250, 500, 1000, and 1900) in one of the classification task. The cross-validation results did not have much difference in terms of accuracy. Selection of 500 features ensured the MI values of nearly 0.5 or more for all the classes and was most obvious choice due to optimality between number of features to be selected and high MI values. In further analysis of the top 500 features, it was found that all 500 features were data-driven features. Figure 4 shown below the MI score between features and the class labels for different binary classification tasks.

Fig. 4
figure 4

MI score between features and class labels for different binary classification tasks. 500 top features were selected for constructing classification tree in all the tasks

Figure 5 gives decision trees for each of the binary classification tasks. The decrease in gini value from higher node to a lower node of a tree denotes the strength of the split. Higher decrease indicates better splits for a given tree. Gini impurity for a set of items with J classes (1, 2, 3 … J), and pi denoting the fraction of items labeled with class i, is given by:

$$ gini=1-\sum \limits_{i=1}^J{p_i}^2 $$
(11)

Figure 5 (A) depicts a decision tree which classifies Parkinson’s disease versus control. If the standard deviation of subsequence from 16 to 45 in right stride interval time series is less than or equal to 0.031, then the subject is classified as control. If this statistic is greater than 0.031, then the subject is classified as Parkinson’s disease patient. Similarly, all other trees can be deciphered. A summary of features in all trees of Figs. 5 and 6 is given in Table 5.

Fig. 5
figure 5

Decision Tree classifiers obtained for the 4 binary classification tasks. The features used in these trees are described in Table 5. X [n] denotes the statistic value of feature ‘n’, gini indicates the gini value of all samples within the box, samples indicate the number of training samples reaching that box in the decision tree, and value given by [x, y] denotes the ‘x’ diseased sample and ‘y’ healthy/control sample reaching that box. Tables 6, 7, and 8

Fig. 6
figure 6

Confusion matrices for all the category of binary classification

4 Discussion

Previous studies have demonstrated importance of statistical [16], frequency [18], and bilateral limb coordination features [22] in NDD classification. We utilized a combination of these approaches keeping in mind the human visual observation of the data. This combined approach of features led to a high dimensional data of greater than 7000 which was reduced to 500 based on mutual information criterion. MI based analysis revealed the dominance of data driven features over other auto and cross correlation based features. Among top ranking 500 features with MI value nearly 0.5 or greater all were data driven features. Utilizing these data driven features in current approach produced better accuracy than previously reported accuracies in any category of binary classification. For example, the classification of Parkinson’s disease was achieved with 90.32% in previous work [16] using time domain and STC features, however current work improved the accuracy up to 92.3% by utilizing data driven features. Similarly, previous work [13] reported the classification accuracy of 82.8% while classifying ALS using the mean of the left-foot stride interval and the modified Kullback-Leibler divergence (MKLD). Our work shows superiority of data driven features by classifying ALS with 96.2%. It has to be noted that the classifiers are different in both the previous studies compared to our work which may account for differences in classification accuracies. Present study used Decision Tree classification compared to SVM in previous studies as Decision Tree is more interpretable than SVM. However, for NDD vs. control classification present approach underperform comparative to some previously reported accuracies [34]. We attribute the higher accuracy in previous work [34] to the unbalanced dataset used. The previous work used unbalanced dataset for the classification (20 patients with HD, 13 patients with ALS, 15 patients with PD and 16 healthy controls) and for NDD vs. control (48 patients with NDD and 16 healthy controls) which might have led to over fitting and biased classification accuracy. However, in the present work, in order to develop an unbiased and not an over fit classifier the number of subjects in each NDD category was compromised to have a balanced dataset (n = 13 in each category) and for NDD vs. control (12 patients with NDD and 12 healthy controls). Further, pooling Huntington’s disease in NDD might have deteriorated the classification accuracies by narrowing down the classification margin - the classification accuracy for Huntington’s disease vs. control was lower with 88.5%. Contrary to classifiers like Support Vector Machine (SVM) and others in previous studies, Decision tree classifiers do features selection itself by choosing the best splitting point amongst all features to create split between the data and the final classifier is based on only very few features and thus minimizing over fitting. A Leave one out cross validation (LOOCV) method was performed further to avoid the redundant features in an attempt to minimize over fitting with the small dataset in the present study.

A detailed comparison of present work with previous reported work is shown in Table 9.

It is interesting to see that the visual pattern were dominating and successfully transferred to decision tree similar to previous research [27]. Recently, it has also been shown that the expert knowledge improve automatic probabilistic classification of gait joint motion patterns in children with cerebral palsy [35]. In this paper we have followed the similar mechanism and visual input from the researchers, as shown in Fig. 1, that in control subjects the standard deviation of data points in all gait variables from 30 to 60 is highly different from NDD patients were embedded in features list which proved efficient as all dominating features selected by Decision tree were data driven features. Though data driven features provided superior accuracy but it led to very high dimensional feature space. Mutual information (MI) was utilized for data reduction in the present work. An MI approach was preferred for dimensionality reduction over Principal Component Analysis (PCA) because PCA loses physical interpretation after linearly transformation of original variables, while MI retains the physical interpretation which suits to Rule based classification. The previous study [21] reported that the random forest classifier has the best average performance amongst all classifiers. Random forest classifier is the ensemble of decision tree classifiers. Also, random forest classifiers are not as interpretable as single decision tree classifier. Hence, classification is done using Decision Tree classifier. Previously, Random forest has shown lower classification accuracy than SVM for the same database [34], however in the present work we could reach to the at par accuracy using hybrid approach features and decision tree.

With the encouraging results in Table 8 for classification in less severe patients, we believe the present work has potential to serve both – 1) the physician and 2) the patient. The work has a potential to be translated for the benefit of physicians and the patient. In one hand, the physician can use the rule based classifier to identify the stage of the patient and thus decide the diagnosis, while on the other hand the system is portable to be used as wearable system for patient and observe any abnormality in gait pattern at initial stages and timely consult a physician. Despite of overlapping walking speed at early stages of NDDs, as shown in Table 2, the rule based classifier provides respectable accuracy in classification showing the impact of approach in the present work. Thus with more number of participants, the present approach has enough scope to improve the classification accuracy in less severe participants.

Table 2 Summary of Demographics and severity measures for less severe patients (n = 5)
Table 3 ANOVA Test for demographics differences across groups
Table 4 Post Hoc test for multiple comparisons across the groups
Table 5 The description of various features employed in the decision trees
Table 6 Description of various features employed in the decision trees for less severe patients
Table 7 Sensitivity, Specificity and Accuracy (all in %) values of the classifier for the classification
Table 8 Sensitivity, Specificity and Accuracy (all in %) values of the classifier for the classification of less severe patients
Table 9 Comparisons of current approach to previously reported approaches

The primary motivation to adapt the present methodology for real time implementation is of two fold. One was to use the minimal computational resources, for example implementable on any smart phone, so that the overall system is portable, wearable and can be used as home-based solutions. The present approach practically needs only a switch-based insole and a smart phone for implementing the proposed system. With this home-based solution we target the elderly population who are unable to visit clinics on a regular basis. Second, a real time implementation will facilitate a quick diagnosis thus saving the time of physicians and the patients both.

Present study has some limitations too. We utilized only one classifier but this attributes to the choice of rule based classification system. Rule based classification is favorable for real time system and a wearable sensors integrated shoe in the author’s laboratory is under process for real time implementation and will be reported in future publications. Dataset unbalancing problem in the present study case was addressed by random under-sampling method however some other balancing dataset solution can be adopted in future to improve the accuracy further. Future work will primarily involve the real-time implementation of current approach. The presented work studied the online data available and therefore a real-time study is required to investigate for robust evaluation of the proposed approach. In addition, it will be interesting to see in future that if the methods can be helpful for home-based diagnosis of neurodegenerative disease patients who are at early stage. Such an application will be of tremendous use in improving the quality of life among these patients.

5 Conclusion

The present work generated a new set of features from the gait signals, which were more effective in doing classification than a recently published study. Simplistic feature selection was then done, and finally a single decision tree classifier was trained to do classification. This method achieved accuracy of 88.5, 92.3, 96.2, and 87.5 (all in percentage value) while classifying controls from HD, PD, ALS, and NDD respectively. The method was also effective while classifying the less severe patients and thus proposing the impact of the work for meaningful clinical application. The decision tree provided a set of rules for classification, which added interpretability to the classifier. This research work clearly demonstrated the importance of the features generated, which were never used before in the prior studies. These features should be taken into account while doing further studies on this problem.