1 Introduction

Neurodegenerative diseases (NDD) are a set of ailments that involve the central nervous system leading to severe disabilities that worsen with time until eventual death [1]. According to the National Institute of Environmental Health Sciences (NIH) statistics, millions of people around the world suffer from different kinds of NDDs [2]. For the past three decades, the number of deaths attributed to NDDs has increased by 39%, while disability-adjusted life-years (DALY), which is the sum of years lost and years lived with disability, has deteriorated by 15% [2]. The most common NDDs include Parkinson’s Disease (PD), Alzheimer’s Disease (AD), Huntington’s Disease (HD), and Amyotrophic Lateral Sclerosis (ALS). This work, however, is focusing on three NDDs namely, PD, HD, and ALS. Despite the lack of curative options for NDDs, several pharmaceutical interventions are possible to manage the severity of the symptoms. Nevertheless, the success of such treatments highly relies on early detection of the disease to limit the damages incurred. Given that all NDDs affect the nervous system, early detection of such diseases via gait patterns has been the topic of extensive research since the start of this century as it provides a non-invasive and practical method of diagnosis [3, 4]. Thus, advances in gait studies were able to establish early detection measures for NDDs, and more importantly to differentiate between normal aging gait patterns and those specific to a particular neurological disorder, such as PD, AD, HD, and ALS [5,6,7,8]. Such advances were only made possible due to the development of machine learning technologies and advanced sensors, particularly since these anomalies, especially at their onset, are imperceptible to human eyes. Nevertheless, due to overlapping symptoms, the classification and differentiation of NDDs remain a challenging task necessitating further advances in such technologies [9].

Establishing a system that can distinguish between three distinct NDD types, namely PD, HD, and ALS, from normal aging gait, requires a full understanding of the variations these conditions exert on normal gait patterns. Gait dysfunction in PD is a marker of moderate and advanced stages of the disease. While the freezing of gait is generally the biggest concern, this is not considered a reliable diagnostic measure, as it is not repeatable in clinical settings. Thus, PD is evaluated using the unified Parkinson’s disease rating scale (UPDRS) [10] or the movement disorders society unified Parkinson’s disease rating scale (MDS-UPDRS) [11]. In general, gait abnormalities in PD include low speed, reduced step length, and balance, increased double-limb support, and cadence [12]. Additionally, issues in gait initiation, making turns, freezing of gait, and postural control are known markers, but these are not normally considered in machine learning methods of gait, as these data are usually truncated to minimize noise. When it comes to Huntington’s disease (HD), the Total Motor Score (TMS) defined by the unified Huntington’s disease rating scale (UHDRS) dedicated only 12 points to gait out of its 124 motor score metrics, limiting the observable effect of gait alone in determining the prognosis of the disease or its progression [13]. What distinguishes HD from other NDDs is higher variability in gait cycles rather than the symptoms themselves, which include among others, bradykinesia (or low speed), truncal sway, stride time, stance time, and swing time. Thus, the best measurements of HD are the coefficient of variance of stride, swing, and stance times [14]. On the other hand, ALS patients also share a certain degree of variability in gait cycles, but more specifically, the average stride time and gait cycle are longer than those observed in HD and PD [15].

The normal walking cycle is standardized and divided into several intervals, as shown in Fig. 1a. Several studies have analyzed gait patterns in NDD patients which can be categorized into two groups: statistical analysis and machine learning-based, both of which rely on careful examination of the walking cycle to quantify deviations from healthy stride-to-stride fluctuations. In general, stride-to-stride fluctuations can be overserved in the walking pattern of subjects suffering from NDDs and it is the basis of many statistical analysis-based studies conducted in the past [4, 9, 16]. The stride-to-stride fluctuations follow a fractal-like pattern which has been analyzed using Detrended Fluctuation Analysis (DFA) in several studies [17]. Healthy young subjects on the other hand produce persistent stride-to-stride fluctuations while walking [17, 18].

Fig. 1
figure 1

a Different phases of a single gait cycle. b The placement of the force sensor inside the shoe. c The representation of the vGRF signals captured from both feet

Classical machine learning techniques rely on extracting time/distance features for NDD detection, which can be divided mainly into two categories. One approach relies on calculations extracted from the gait cycles, such as stride duration, stance time, swing time, etc. as features for their time series analysis. Evidence of fluctuations in these values for NDD patients has encouraged many researchers to follow this method [19, 20]. Yang et al. [21] extracted features from the time series of gait parameters and used 3 feature selection algorithms to select the best features and then used a Support Vector Machine (SVM) classifier to give the predictions. The second approach extracts features from the Ground Reaction Force (GRF) signal of walking gait, such as the work presented by Alaskar et al. where 18 temporal and spectral features were extracted from the vertical Ground Reaction Force (vGRF) to detect 3 types of NDDs [9]. However, this approach is not as widely used as the former approach of feature extraction for detecting NDDs.

Several deep learning-based studies have been conducted in this domain in recent years. Macchi et al. used a multi-channel 1D Convolutional Neural Network (CNN) to detect PD from an 8-channel force sensor data collected from the foot [22]. While Paraglioa et al. proposed a Long Short-Term Memory (LSTM) network to process the gait parameter’s time series signals to predict PD [23]. Other variations on such approaches include converting the 1D vGRF signal into a 2D signal before deep learning analysis is conducted, as in the study presented by Setiawan et al., where vGRF signals were first converted into spectrograms using Continuous Wavelet Transform (CWT) then using a CNN network was used to classify 3 NDDs from the spectrogram images [24]. In parallel, Pham et al. used Fuzzy Recurrence Plots (FRP) to convert gait parameter time series into texture images, which were then used to predict NDDs with the help of the Support Vector Machine (SVM) classifier [25]. However, Erdas et al. applied a very different approach, as gait parameters were converted into QR codes and the time series of the 2D QR code images were processed using a CNN-LSTM network to predict NDDs [26].

Recent studies conducted in NDD classification showed that researchers have used either the gait parameters or the GRF signals as the input to their learning algorithms. Although these methods have produced good results in many cases, we hypothesized that combining both of the input types could maximize the prediction capability. So, in this study, we presented a novel parallel-path neural network architecture that can efficiently process GRF signals and hand-crafted features simultaneously. Each parallel path in the network extracts features from a particular input channel. The vGRF signal and its transformations were processed by a ConvMixer [27] architecture which outperformed regular CNN architecture. Additionally, multiple transformations were applied to the raw vGRF signal and the resulting signals were used as additional input channels for boosting the performance. The proposed algorithm was used to detect and differentiate three different types of NDDs, namely: PD, HD, and ALS. The key contributions of this study can be summarized as follows:

  • We developed a framework to identify NDDs from walking gait patterns captured by force sensors with an average of 96.75% accuracy.

  • We proposed a novel parallel-path neural network architecture that can take advantage of both the vGRF signals and hand-crafted features to produce better results than the vGRF signals alone.

  • We experimented with different transformations of the vGRF signals as inputs to our model to improve its accuracy.

  • To the best of our knowledge, this paper presents the very first exploration of the ConvMixer architecture in the 1D signal domain and it outperformed regular CNN architecture in our experiments.

2 Materials and methods

2.1 Dataset description

The gait dataset used in this work “Gait Dynamics in NDD Database” was provided by Hausdorff et al. [28], and it is available at Physionet [29] (https://physionet.org/content/gaitndd/1.0.0/). The database consists of gait data recordings collected from 64 subjects between 1997 and 2006. Among the subjects, 16 were healthy (14 women and 2 men, age: mean ± SD: 39.3 ± 18.5 years), 20 subjects had HD (14 women and 6 men, age: mean ± SD: 47.7 ± 12.2 years), 15 subjects had PD (10 men and 5 women, age: mean ± SD: 66.8 ± 10.9 years) and 13 subjects had ALS (10 men and 3 women, age: mean ± SD: 55.6 ± 12.8 years). The HD subjects were evaluated using the Unified HD Rating Scale (UHDRS) to provide the total functional capacity (TFC) [13]. Hoehn and Yahr’s scale was used to assess the PD subject’s degree of impairment [30]. The ALS subjects were not categorized since there is no standard approach for evaluating the degree of the disease [31].

The data was collected at the Massachusetts General Hospital (MGH), Neurology Clinic. For collecting the walking gait data, ultrathin force-sensitive foot-sole sensors were used. The sensor was placed in the shoes of the subjects and they were asked to walk for 5 mins in a 77 m long straight hallway. The data were recorded with a sampling rate of 300 Hz ignoring the initial 20s to eliminate the startup effect [32]. The vGRF signal was digitized and processed by the gait-cycle segmentation algorithm proposed by Hausdorff et al. [33] to derive the stride-to-stride foot contact times. After segmenting the signal into individual gait cycles, the following spatiotemporal gait parameters were extracted from each segment:

1. Left Stride Interval (sec)

7. Left Stance Interval (sec)

2. Right Stride Interval (sec)

8. Right Stance Interval (sec)

3. Left Swing Interval (sec)

9. Left Stance Interval (% of Gait-Cycle)

4. Right Swing Interval (sec)

10. Right Stance Interval (% of Gait-Cycle)

5. Left Swing Interval (% of Gait-Cycle)

11. Double Support Interval (sec)

6. Right Swing Interval (% of Gait-Cycle)

12. Double Support Interval (% of Gait-Cycle)

NDDs affect muscular strength which results in slower walking speed and stride-to-stride variability. We can observe these variations in different NDD classes compared to the control group in Fig. 2.

Fig. 2
figure 2

Variations in the gait parameters in the different groups of subjects in the dataset. The healthy group has higher walking velocity and lower stride-to-stride time compared to the disease groups

2.2 Preprocessing

Data pre-processing plays an important role when using 1D signals for deep learning algorithms. The data recordings are affected by noise from the surrounding environment and as a result, unrelated or redundant data is introduced which may affect the analysis process. Several techniques are used for data pre-processing to enhance the raw signal like outlier removal, noise filtration, standardization feature extraction, selection, etc. [34,35,36,37,38].

Force signals recorded during the initial part of the trial were inconsistent due to the “startup effect” [39]. To eliminate this effect, the first 20s of the recordings were ignored as has been done in similar studies [40]. The force signal was processed by a second-order Butterworth low-pass filter with a 15 Hz cut-off frequency to remove the high-frequency noise present in the signal. The choice of filter order and cut-off frequency was made based on the literature [41, 42] on processing GRF signals. After that, the amplitude of the signal was normalized using the z-score normalization technique, as this is the most popular normalization technique for 1D signals. Finally, each gait-cycle segment of the force signal was resampled to 120 data points to make it suitable for the input to the deep learning model.

The extracted gait-cycle parameters also required some pre-processing. At the end of the hallway when the subject needed to turn back, some data points were recorded as well. This is known as the so-called turning back effect which results in irregular data points. In some of the studies done on this dataset, this effect was mitigated using the 3-sigma rule [39, 43, 44]. Following this rule, any data points that are greater than 3 SD or less than 3 SD from the overall median value were removed during preprocessing.

2.3 Applying transformations to the vGRF signal

The vGRF signal portrays the change of vertical force applied by the foot during the different phases of the gait cycle. The shape and pattern of the vGRF signal for a particular subject provide important information about the walking pattern of the subject. Since neurological disorders damage the motor neurons, any activity which includes movement is affected. As a result, it is expected to have gait variability among the NDD patients which gets reflected in their vGRF signal pattern. Figure 3a displays the vGRF signal from the different groups of subjects from the dataset. A clear difference can be observed among the different groups. From our experimentation and visualization of the dataset, we have found that the rate of change of the vGRF signal can provide useful information about the disease group of the subject which can help us during the classification process. So we added 3 more channels of data along with the original vGRF signal which represents the first and second derivative and the integral of the signal with respect to time correspondingly. Figure 3b, c, and d show the difference in patterns in different disease groups compared to the healthy control group.

Fig. 3
figure 3

vGRF signal patterns for different groups of subjects in the dataset. The line represents the mean value and the shaded region represents the 1 SD region from the mean value. a vGRF signal from the left foot. b The first derivative of the vGRF signal with respect to time. c The second derivative of the vGRF signal with respect to time. d Integral of the vGRF signal with respect to time.

2.4 Feature extraction

Some additional gait-cycle features were extracted along with the ones provided in the dataset. First, the distance covered by the subjects by multiplying their average velocity by their walking time was calculated. From that, the average distance per stride and instantaneous velocity using Eqs. (13) were calculated. The cadence for all the subjects using Eq. (4) was also calculated. The BMI of the subjects were calculated from their height and weight using Eq. (5). Two of the subjects had missing weights and one of the subjects had missing velocity, which was imputed using the median value for that group. The gender of the subjects was also included as a categorical feature.

$$ Total\ Distance\ Covered= Average\ Velocity\times Walking\ Duration $$
(1)
$$ Average\ Stride\ Distance=\frac{Total\ Distance\ Covered}{Number\ of\ Steps} $$
(2)
$$ Instantaneous\ Velocity=\frac{Average\ Stride\ Distance}{Stride\ Interval}\kern0.5em $$
(3)
$$ Cadence=\frac{Total\ Number\ of\ Steps}{Walking\ Duration\ \left(\mathit{\min}\right)} $$
(4)
$$ BMI=\frac{Weight(kg)}{Height{(m)}^2} $$
(5)

2.5 The proposed network architecture

In this section, we present the proposed network architecture for this study (Fig. 4). Two separate types of data were utilized as the input to the network. The vGRF signals from both feet and their corresponding integral, derivative and second derivative channels produce 8 channels of time series data. Each of these channels was processed by a 1D ConvMixer block [27].

Fig. 4
figure 4

The proposed network architecture. The network is composed of two parallel parts for processing the vGRF signals and the gait-cycle parameters. a The entire network architecture. b The ConvMixer block architecture

The ConvMixer layer proposed by Trockman et al. [27] has shown great performance in computer vision tasks. Although its performance in the 1D domain is yet to be explored, the simplicity of the network and comparatively better performance than similar vanilla CNN architecture have encouraged us to use this network for this study. The ConvMixer model uses similar isotropic architecture as Vision Transformers (ViT) [45] where the input image (or signal in this case) is divided into small patches, projected linearly and then the Transformer [46] blocks are applied. The ConvMixer network starts with a Conv Stem to extract patch embeddings using a convolution layer with the same kernel size and stride which in this case was set to 5. After that, the patch embeddings are passed through the activation block containing Gaussian Error Linear Unit (GELU) [47] activation layer and a batch normalization layer and to the ConvMixer block. The ConvMixer block consists of a Depth-Wise Convolution layer and a Pointwise Convolution layer and a Residual Connection as shown in Fig. 4b. The depth, width, and kernel size of the ConvMixer branches were tuned and were set to 3, 16, and 5, respectively. The ConvMixer block is followed by a Global Average Pooling layer that extracts the embeddings. At last, all the embeddings from the parallel paths are concatenated together followed by a dense layer of width 32. Fifteen extracted features on the other hand are processed by two dense layers of width 32 and 16, respectively. After that, the embeddings are concatenated with the embeddings generated from the GRF signal processing branch which produces the output in the final layer.

The network was implemented using the TensorFlow [48] Python library. For the loss function, the binary cross-entropy loss was utilized. For the optimizer, Adam [49] optimizer with a learning rate of 0.0003 was used. The network was trained for a maximum of 300 epochs with an early stopping criterion set on the validation loss with a tolerance of 30 epochs. All the models were trained on a laptop with an 11th Gen Intel(R) Core (TM) i5-1135G7 processor running at 2.40GHz with 16GB of RAM. No GPU memory was used during training or inference. The source code of this project is publicly available to ensure reproducibility for future research.Footnote 1

2.6 Experiments

As mentioned previously, the “Gait Dynamics in NDD Database” contains three different NDD classes: PD, HD, and ALS in addition to healthy control (HC). Thus, to distinguish a specific disease pattern from the healthy control, four different experimental setups were used. Three binary classification experiments for the 3 disease groups in comparison with the healthy control group (HC) were conducted. Another experiment was performed where all the disease groups were combined into a single group (NDD) and tried to differentiate them from the healthy group. Additionally, a multi-class classification was performed on the 4 classes of data present in the dataset. This experiment was done to evaluate the proposed algorithm’s capability in distinguishing the NDDs from each other. The conducted experiments were as follows:

  • Binary Classification experiment between the ALS and the HC group

  • Binary Classification experiment between the HD and the HC group

  • Binary Classification experiment between the PD and the HC group

  • Binary Classification experiment between the NDD and the HC group

  • Multi-Class Classification experiment among all the different classes

2.7 Validation and evaluation

The Leave One Out Cross Validation (LOOCV) approach was utilized to validate the proposed model’s performance on independent subjects. Using LOOCV, one subject’s data was used for testing while the data from the rest of the subjects were used for training the model. For the validation dataset, 30% of the subjects from the training set were separated and used their data for validation. Using this validation technique, no data leaking between the training, validation, and test sets was ensured. For evaluation, precision, sensitivity, F1-score, and accuracy metrics were used (Eq. (69)) [50].

$$ \mathrm{Precision}=\frac{{\mathrm{T}}_{\mathrm{p}}}{{\mathrm{T}}_{\mathrm{p}}+{\mathrm{F}}_{\mathrm{p}}} $$
(6)
$$ Recall=\frac{T_p}{T_p+{F}_n} $$
(7)
$$ F1\ score=\frac{Precision. Recall}{Precision+ Recall} $$
(8)
$$ Accuracy=\frac{T_p+{T}_n}{T_p+{T}_n+{F}_p+{F}_n} $$
(9)

3 Results and discussion

As mentioned in the “Experiments” section, five experiments were carried out in this study. For comparison, the results are summarized in Tables 1 and 2 for vGRF signals alone and vGRF with features, respectively. It is worth mentioning here that for vGRF alone, vGRF signals from both feet and their 1st and 2nd derivatives and integrals were used. The precision, recall, F1-score, and accuracy score are reported for all the experiments. Sample-wise metrics were used, where each sample represents a single gait cycle. In each table, the last column of the table represents the number of subjects who were incorrectly classified.

Table 1 Experimental results for different NDDs using only vGRF channels as input
Table 2 Experimental results for different NDDs using the proposed approach

In Table 1, we reported the metrics we achieved using only the vGRF signals from both feet and their derivatives, second derivatives, and integrals as inputs to our model. The model performs well for ALS and PD detection, however, did not perform well for HD, NDD detection, and multi-class NDD classification.

In Table 2, the reported results were from the proposed model using both the vGRF signals and the extracted features as the inputs. By adding the features, the results were significantly improved for all of the classification tasks as can be seen in Tables 1 and 2.

The results in Table 2 demonstrate the improvement in the classification of all binary tests performed by adding the features to the model, with an average improvement of 14% in accuracy and a marked decrease in the number of failed cases. As can be seen from the table, for the ALS vs HC classification experiment, our proposed method works great and correctly classifies all the subjects. For the HD vs HC and the PD vs HC problem, the proposed algorithm fails for 1 subject only.

Figure 5 shows the confusion matrices for all the experiments discussed above using the proposed algorithm. The confusion matrices were calculated in a subject-wise manner. The predicted label for a subject was determined by a majority voting criterion. In other words, the subject was classified into the class in which the majority of the samples for that subject were predicted. The Receiver Operating Characteristics (ROC) curve and the Precision-Recall curves for the classification tasks are shown in Fig. 6.

Fig. 5
figure 5

Confusion matrices for the 5 experiments performed on the dataset. The confusion matrices are calculated subject-wise where all the samples from a particular subject were classified into a single class based on majority voting

Fig. 6
figure 6

a The Receiver Operating Curve (ROC) for all the binary classification tasks. b The Precision-Recall curve for all the binary classification tasks

Table 3 outlines the accuracy of our proposed model in comparison with the other machine learning-based work done on the GaitNDD dataset used in this work. It is worth mentioning here that the validation technique used for producing the metrics affects their values quite a lot. Ideally, we want the test set to be independent of the train set to provide the most accurate metrics. If the same subject’s data is shared in both train and test sets, the problem becomes much easier and the value of the evaluation matrices becomes higher. One way of preventing this scenario is to perform a Leave One Out Cross Validation (LOOCV) [58], where one subject’s data is held as the test set while the data from the other subjects are used for training. Most of the previous works on this dataset have used this approach for validating their metrics. To be comparable with the existing literature, we also utilized LOOCV as the validation technique for our experiments. 10-fold cross-validation is also a good validation approach in this case if it is done on a subject-wise basis. In [26, 40], a 10-fold CV technique was used and their reported metrics are better than ours in some cases. However, their implementation of the 10-fold CV has a crucial issue. As mentioned earlier, we want to evaluate the model’s performance on a completely unseen subject’s data which simulates real-world situations. We can achieve this by putting all the data from a particular subject in the test set. However, in [26, 40] the authors mixed the samples from all the subjects before creating the 10 folds. As a result, the same subject’s data can be present in both training and test sets. So, the proposed metrics are not representative of the model’s performance on unseen subjects’ data.

Table 3 Summary of the recent works done on NDD detection from force signals in recent years

In [25], the authors used Fuzzy Recurrence Plots (FRP) to generate texture images from the gait parameters and their reported results are better than ours. However, in their proposed approach, only one texture image was generated from a particular subject’s data. So, from the 64 subjects’ data belonging to 4 classes, only 64 images were generated. So, the number of images per class is very low ranging between 13 and 20. The number of images is very small for training a generalized machine-learning algorithm. On the other hand, we utilized every single gait cycle as a training example. Following this approach, we achieved 14,412 training examples which are enough for training a generalized machine learning model. Moreover, the approach in [25] requires 5 minutes of walking data to generate a single texture image which is not suitable for real-time deployment. Also, generating texture images using FRP is a computationally expensive process that limits the deployment options for this algorithm. On the other hand, the proposed solution provides a robust and real-time deployable solution with comparable performance.

3.1 Applications

The proposed algorithm demonstrates a novel way of combining GRF signal with its various transformations and hand-crafted features to predict 3 different NDDs. This opens up a new way of analyzing GRF signals with the help of deep learning which can be adopted by future researchers in similar domains. The proposed deep learning network is lightweight (239 KB) and can be deployed on edge devices like smartphones. Early detection of NDD is important for halting its progression and providing proper treatment. Various wearable foot soles are available in the market these days which can provide real-time GRF signals. Combining our algorithm with these devices can provide a real-time gait monitoring system for the early detection of NDDs which has great potential in the clinical domain.

3.2 Limitations

The dataset used in this work is relatively small containing 13–20 subjects’ data in each class. Since the sample size is small it might not be a good representative of the whole population. Also, the data from different genders are not balanced. We tried to mitigate the gender imbalance issue by using gender as a feature while training our models but it would be better if the dataset had a similar amount of data from all genders. Although the proposed algorithm performed well in classifying a single NDD from the healthy class, it did not perform as well in the multi-class experiment because classifying one NDD from another based-on gait is a much more challenging task.

4 Conclusion and future work

Neurodegenerative diseases (NDD) greatly affect the health and quality of life of millions of people around the world. In this paper, we presented an automatic detection technique for the three most common neurodegenerative diseases: Amyotrophic Lateral Sclerosis (ALS), Huntington’s disease (HD), and Parkinson’s disease (PD). These diseases share common symptoms that affect the motor capabilities of the patients which are reflected in their walking gait. Force sensors placed in the foot sole were used to acquire GRF data while walking for the prediction of different NDDs. While previous studies in this domain only focused on either the GRF signals or the gait parameters, we developed a deep learning network to utilize both as input to improve the prediction capability. By utilizing both the raw signals and some hand-crafted features, the proposed model outperformed previous works done on the particular dataset used in this work.

Although an outstanding result is obtained in this study in detecting NDDs, it is important to train on a larger dataset for building a robust model which will work flawlessly in real-world scenarios. Because of the small size of the dataset, the model’s architecture and hyperparameters needed careful tuning to prevent any overfitting. For future work, we plan on applying the proposed model to a larger dataset to produce a better framework for detecting NDDs. We also look forward to applying Generative Adversarial Networks (GAN) like CycleGAN to augment more data and make the dataset larger and more general. We believe this approach in combination with our proposed algorithm will provide enhancements and robustness over this current work.