1 Introduction

The ability to walk is considered as a crucial aspect in order to maintain a good quality of life. Humans perform multitudes of walking speed throughout a day in response to socialcultural factors. Self-chosen gait speed requires the selection of stride length [42], joint angular displacement [22, 57], joint torque and power [8, 45]. However, studies have found that the ability to maintain maximum walking speed reduces with age [27, 65]. Such a condition can be resulted from numerous age-related physiological changes, including joint degeneration [5], muscle weakness [46] and neurological diseases [50]. Furthermore, a few cohort studies have affirmed the association between reduced gait speed with cardiovascular disease [38] and dementia [24]. On the contrary, walking speed, and not age has been considered as the primary determinant of the kinematic and kinetic changes in children and young adult [59, 66]. In light of that, exploration on the knowledge about the effects of gait speed on biomechanical variables is paramount for benefitting clinicians who commonly rely on the outcomes of gait analysis to gauge functional level of patients and to optimize patient care.

In clinical practice, the gait analysis is carried out by clinicians via visual examination [20]. They conduct evaluations on movement patterns of key body segments such as foot, ankle, knee, hip, pelvis and trunk during each phase of cycle [30, 48, 58]. Manual involvement of many segments in a dynamic movement prone to high complexity during clinical assessment. The multiple movements of various segments, which occurred concurrently during a dynamic motion, has resulted in neglections of imperative gait issues during an assessment [7] and observational scales [20, 53]. Several studies and reviews have reported poor reliability with observational gait analysis [7, 49, 51]. Tanikawa et al. estimated poor inter-rater reliability (Cohen’s Kappa coefficient 0.1–0.3) even among experienced clinicians [3]. The poor reliability could be attributed to variation in patient state [64], lack of operational definition of gait parameters and insufficient training of the clinicians [62]. As such, observational or bedside gait analysis is of limited value in patient management.

The limitations of the observational method have led to the development of instrumented gait analysis. In this system, a computerized measurement technology is employed to evaluate the rigid segments of a human model [34]. A three-dimensional (3D) pose of the human body is captured in six degrees of freedom (DOF), which three relating to translational and three defining rotational [13]. With the incorporation of body segment inertia parameters, the whole-body centre of mass location can be deduced thus enhance the interpretive power for a kinematic evaluation [55]. Moreover, kinematic and kinetic data can be combined to allow the calculation of joint moments and net joint reaction forces through the inverse dynamics analysis [19]. An increasingly useful application of instrumented gait analysis enables accurate quantification of whole-body pose thus provide clinician with a comprehensive understanding of underlying conditions that affect patient’s mobility [61].

With the growth of motion capture system [44, 47, 48, 67] in both visual surveillance and human-machine interface [31], it demonstrates great potential to automate human gait analysis [2, 52, 60]. Different forms of gait biometrics can be obtained based on how the gait information is measured, either via sensor or camera approaches [9, 14, 15]. Various physical factors contributing to human gait, such as height, weight, leg lengths and joint proportions, form one intrinsic pattern of gait characteristic unique to every individual. However, the data representations of the gait tend to be large and contain a high degree of variability [32], making potentially important patterns within the data unrecognisable. To filter these data, it is crucial to develop a data-driven method that can extract the intrinsic and distinguishable gait patterns from camera or sensor measurement. Studies have suggested that machine learning methods with the attribute to learn and uncover the underlying distribution of data, are well suited for accomplishing this objective [18, 29]. These techniques have long been deployed in the areas of predictive modelling and data mining. Predictive modelling is concerned with finding a function that optimally maps input data to a given output with the goal of making accurate predictions in the unseen data [1]. One example of predictive modelling in biomechanics application is detecting cerebral palsy in children using data from motion capture system, where models are trained to identify children with neuromotor disability based on postural-point features [16]. More recent efforts have centred models for fall detection [26], activity recognition to facilitate out-of-clinic patient monitoring during rehabilitation [36], and event detection to guide interventions such as gait modification [63] and orthopaedic surgery [11]. Data mining, on the other hand, is employed to discover new patterns in the data. Its area of application includes using clustering and classification methods to identify subpopulations that exhibit different types of pathological gait [32].

Machine learning is ubiquitous, impacting many aspects of our life [32]. Recent advances in machine learning have demonstrated remarkable performance in supervised learning tasks, enabling a wide range of applications [54]. Support vector machines (SVM) have been widely employed to detect gait subphases from knee and hip angle parameters based on time-series measurement. A study conducted by Luo et al. used SVM to identify the normal and abnormal gait by detecting the sequence of gait phases [37, 43]. Convolutional neural networks (CNNs), which comprise local feature extraction, weight sharing, and pooling, are employed by various studies [17, 68] to solve the multiclass classification problem, capturing the correlation in temporal data in human motion activities [39] and prediction of gait periods [25]. However, CNN does not capture temporal dependencies [25, 33]. In contrast, recurrent neural network (RNN) has achieved many promising results in sequence modelling tasks such as automatic speech recognition [23, 41], and machine translation [12]. Filtjens et al. has adopted RNN to analyse the inertial signal of Parkinson’s patients with freezing of gait [21]. However, an ordinary RNN has gradient vanishing issues, preventing it to handle long sequences [12]. To overcome this limitation, a modified version of RNN has been invented: long short-term memory (LSTM). To overcome this limitation, long short-term memory (LSTM) is developed. With gating mechanisms, LSTM can handle long time dependencies in sequential data [4]. LSTM demonstrated success in classifying gait events in children [35].

The aim of this work is to develop learning methods in advancing automatic analysis of human gait and interpreting human walking speed from a data-driven perspective using kinematic data. For this purpose, we propose an RNN model for supervised classification. The model is trained to capture the temporal dependencies and characteristics of human gait data for classification of walking speed. The presented approach investigates the suitability of understanding and interpreting the classification of gait patterns using state-of-the-art machine learning methods. This project therefore presents a first step towards establishing a powerful tool that can be used as the basis for future application of machine learning in human movement analysis.

2 Methodology

2.1 Subject selection criteria

Seventeen subjects without any neuromuscular impairment, whose age ranges between 20 to 24 years, were selected for this study. Inclusion criteria for the subject selection consist of two requirements: (i) subject must be over 18 years old; and (ii) subject should be representative of an average healthy young adult which did not experience any damage to their bone or muscular structure that currently impacts their ability to walk. All subjects were asked to disclose any known conditions that affected their walking speed for the accuracy of the study. The user profile which contained information such as subject number, height, weight, sex, age, ethnicity, and frequency of physical activity were filled up by the subjects as part of a survey and recorded. They were advised on the procedures and steps involved in the data collection prior to conducting the experiment. The entire data collection process followed the approved protocol granted by the NUS Institutional Review Board (Reference code: B-14-265).

2.2 Data collection

The dynamic performance of the walking speed was conducted at NUS-BME Gait Laboratory, Department of Biomedical Engineering, National University of Singapore. The gait analysis study was performed using the Vicon Motion System ((Vicon MX, Oxford Metrics, UK). This system comprises of eight 100 Hz high-speed cameras, two AMTI force-plates (Watertown, USA), and a data station (Vicon MX Control) where the captured walking trajectories were processed. Anthropometric measurement was acquired for each subject, including the bilateral leg length, ankle and knee width. The reflective markers of 14-mm were placed on the pelvis and lower limb of the subjects according to the biomechanical model of the Vicon Plug-in Gait (Fig. 1).

Fig. 1
figure 1

Marker placement for Plug-In Gait from front (right) and back (left) view

Before initiating the data collection, calibration was performed on each labelling subject using static trial. This was to help ensure that the markers which were attached on the subject were digitised in the camera view and the segment coordinate systems were defined relative to it. All subjects were requested to perform self-selected slow, normal and fast walking, with 3 trials for each speed, respectively. Subjects were required to be barefooted, and the walking speeds were administered.

The self-selected walking speeds of the subjectss for every trial was characterized, in a post-hoc manner, as slow, normal or fast:

$${\displaystyle \begin{array}{c} slow:\kern1.75em 0<\kern0.5em {v}^{\ast}\le {\overline{v}}_{normal}^{\ast }-{\sigma}_{normal}^{\ast}\\ {}\mathrm{normal}:\kern1em {\overline{\mathrm{v}}}_{\mathrm{normal}}^{\ast }-{\upsigma}_{\mathrm{normal}}^{\ast }<{\mathrm{v}}^{\ast}\le {\overline{\mathrm{v}}}_{\mathrm{normal}}^{\ast }+{\upsigma}_{\mathrm{normal}}^{\ast}\\ {}\mathrm{fast}:\kern2.25em {\overline{\mathrm{v}}}_{\mathrm{normal}}^{\ast }+{\upsigma}_{\mathrm{normal}}^{\ast}\kern0.75em <{\mathrm{v}}^{\ast },\end{array}}$$

Non-dimensional walking speeds represented by \({\mathrm{v}}^{\ast }=\mathrm{v}/\sqrt{\mathrm{g}\ {\mathrm{L}}_{\mathrm{leg}}}\) (v is absolute gait speed; Lleg is leg length, and g is gravitational acceleration), \({\overline{\mathrm{v}}}_{\mathrm{normal}}^{\ast }\) and \({\upsigma}_{\mathrm{normal}}^{\ast }\) are the mean and standard deviation, respectively, of the non-dimensional normal comfortable walking speed of the subject group [28, 56].

Once a walking pattern was completely recorded, a 3D trajectory was created to show paths taken by each joint in which the marker was attached. These marker data were transformed using rigid-body kinematics into joint angles, which are 3D representations of body movements between segments over time. Followingly, kinematic gait parameters for each walking speed were determined from the synchronized coordinate and force data by using the Vicon Nexus Software.

2.3 Data rearrangement and labelling

There were total of 459 gait cycles collected from the walking trial using Vicon Nexus. However, 6 sets of gait cycles were found to have missing value in the data. The missing data are often a result of occlusion, as markers maybe blocked by body parts or other objects during the tracking process. Also, the motion data can be corrupted by noise for a long period of time. Since the incomplete data can distort the underlying pattern, they were eliminated from this study. The data from the remaining 453 gait cycles consisted of kinematic parameters, which included the ankle, hip and knee angles from sagittal, frontal and transverse planes (Fig. 2). These parameters were employed as cores to differentiate the gait speed in our proposed neural network training. Prior to train the of machine learning model, data labelling was performed on each set of gait cycle. Categorical encoding was employed to convert the categorical data (“Speed”) into numerical form such that “Slow” = 0, “Normal” = 1, and “Fast” = 2. Information on gait data used in the experiment is summarized in Table 1.

Fig. 2
figure 2

Overview of gait datasets

Table 1 Gait cycles for various walking speeds

Notably, 48,253 observations have been collected from the abovementioned 453 gait cycles using the high infrared cameras. To evaluate the classification performance, data sets for training and testing were constructed based on 80:20 ratio.

2.4 Deep learning framework development and validation

The neural networks were trained using Keras and TensorFlow as its backend, with Google Colab on free Tesla K80 GPU. A motion capture system provides time-series of measurement of human gait. Therefore, RNN is adopted to capture the sequential relationship in the acquired kinematic data. A RNN is a neural network architecture for handling sequential data (e.g., time series). It consists of feedback connections between each of its units, allowing the network to link all prior inputs to its outputs (Fig. 3). While in principle RNN is a powerful model, it can hardly handle a very long sequence. Among the main reason why this model is so unwieldy are the vanishing gradient and exploding gradient problems described in Bengio et al. [6]. Compared to classical machine learning, a Long Short-Term Memory (LSTM) network can handle raw data directly and does not require hand-crafted feature extraction from time series. An illustration of a LSTM is illustrated in Fig. 4. The cells contain gates and self-loop to generate path where the gradient can flow for long durations. To capture long-term dependencies in a sequence, an improved recurrent network (RNN) integrated with cuDNN LSTM (NVIDIA CUDA® Deep Neural Network Library Long Short-Term Memory) were introduced in this study. Noted that the cuDNN LSTM was proven to be about 2 to 2.8 times faster in contrast to the standalone LSTM model [17]. Therefore, it is beneficial for long sequence data as the taken to train is significantly reduced.

Fig. 3
figure 3

A simple recurrent neural network with an input layer (x), hidden state (h) and output (o) at timestep t. The hidden step, h, is served as the memory of the network and calculated based on the previous hidden state and the input of current step

Fig. 4
figure 4

A block diagram of LSTM memory cell. There are three major components in an LSTM network: the input gate, the forget gate and the output gate. These gates contain a nonlinear activation function which is sigmoid. The input gate would decide the amount of information to be added to the cell state. A forget gate determines how much of the previous internal state to remember. Meanwhile, the output gate controls the information to the output based on the cell state

In this study, the proposed model consisted of 6 CuDNN LSTM layers of sizes 768, 640, 512, 384, 256 and 128, followed by a dense layer of size 32, and an output layer of size 3 (Fig. 5). The network employed a batch size of 1000 with the Adam optimizer to run for 2000 epochs. A learning rate of 0.0001 was set as the default parameter. To prevent the training data from overfitting, a dropout of 0.2 was applied to all CuDNN LSTM layers. The rectified linear unit (ReLU) was adopted in the densely connected layer as non-linear activation function. In the output layer, a softmax activation function was employed to calculate the probabilities of each walking speed over the studied speed classes. The developed model was validated by making predictions of different gait speeds from testing data (20%). Subsequently, the prediction accuracy is reported.

Fig. 5
figure 5

The architecture of the proposed deep learning neural framework

3 Result and discussion

3.1 Classification performance of RNN integrated with cuDNN-LSTM deep learning framework

We train our proposed neural network to estimate the walking speeds based on the kinematic lower limb joint data, which were calculated using motion capture analysis software, Nexus (VICON). Our model took 2 h and 46 min to complete the training. To analyse the classification performance, both loss and accuracy metrics have been evaluated and visualized. A loss function provides an estimate of loss that the classifier incurs as a result of disagreement between the predicted label and true label in the training data. We select sparse categorical cross entropy in our study to update the weights by backpropagating the error backward. Followingly, back-propagation is used to reduce the loss function’s value with regard to the model’s parameters by changing the weight vector values through an optimization algorithm known as Adaptive Moment Estimation (Adam) in this model. An accuracy function, meanwhile, refers to the proportion of correct predictions for the test data, and is assessed across epochs to ensure stability. The visualization provides an interpretable way to determine which model is best at identifying the relationships between the kinematic variables based on the training data. To review the performance metrics during the training of our proposed deep learning model, graphs of accuracy and loss on the training and test datasets over 2000 epochs are illustrated in Fig. 6.

Fig. 6
figure 6

The classifier’s training and test for (A) Accuracy and (B) Loss

As showed in the Fig. 6, the training epochs for training data are represented by red lines and blue for test data. Both plots show good convergence of the model across epochs with regard to loss and classification accuracy. From Fig. 6a, the accuracy is found to be increased drastically at around 300th epochs and the increment is slowing down after this point. With 2000 epochs, the percentage for both training and test increases to 98.7% and 97.3%, respectively and does not show an increase in accuracy with higher iterations. Therefore, 2000 epochs are considered as the ideal value for the training epochs of this model. In Fig. 6b, the plot of test loss decreases to a point of stability and has a small gap with the training loss. Therefore, it can be deduced that a good fit is achieved in our model. The test and training loss stop at 1800th and 1900th epochs, respectively.

The proposed deep learning model was then used to perform a prediction on the walking speed of certain gait data chosen from the test sample. The classifier was run to classify 20% of the test samples to assess the prediction confidence. The results showed that the built model has an average accuracy of 97.5% for a total of 20 predictions made (Table 2). A confusion matrix was computed to visualize the overall classification results (Fig. 7). The off-diagonal element is ranged from 0.01 to 0.02, thus indicate low misclassification. Therefore, we can deduce that the model built of CuDNN LSTM is promising to evaluate and train the sequential data to evaluate the walking cycle as it able to capture the long dependencies of the gait data and produce the mapping of sequence from past observations to walking speed.

Table 2 Validation of deep learning model
Fig. 7
figure 7

Confusion matrix of CuDNNLSTM model in gait datasets

3.2 Correlation coefficient between walking speed and kinematic parameters

Aforementioned, the test accuracy of 97.3% has shown the relationship between gait speeds with the joint angles from different planes. To understand the correlation between the mentioned parameters with walking speed, a standard correlation between every pair of attributes has been computed. The results are showed in Table 3. From Fig. 8, we observe an increasing trend in amplitude of the signal with respect to the walking speed from slow to fast, this signifies a positive correlation, which is consistent with the outcomes shown in Table 3.

Table 3 Correlation between each attribute with walking speed
Fig. 8
figure 8

A Knee, B Hip and C Ankle angles for slow, normal and fast walking speed at sagittal, frontal and transverse planes

From Table 3, the knee flexion angles from sagittal, transverse, and frontal plane showed the largest sensitivity to walking speed. Meanwhile, the ankle plantarflexion angle at sagittal and transverse plane had a lower correlation with walking speed across the joint levels. Such a condition can be explained through the visual representation of plane joint angles throughout the gait cycle at knee, hip and ankle in Fig. 8.

From Fig. 8, as walking speed increases, so does the peak joint angles across the knee regardless of anatomical planes. The result matches with the previous research conducted by Mentiplay et al. [40]. Generally, at the beginning of the swing phase, the foot goes from plantar flexion to dorsal flexion. As the knee joint is a hinge type, it allows flexion and extension, with a small amount of rotation and gliding motions [10]. The hips move from extension to flexion, and the pelvis rotates and changes its tilt. An increasing tilt of the trunk in the direction of progression brings the centre of mass of body forward and assist in increasing rate of movement. For a fast-walking speed, higher power will be generated at the knee joint to achieve a longer ambulated distance. Therefore, a higher flexion peak at the knee can be observed across the sagittal, frontal, and transverse planes. The force transferred to the ankle joint will compensate for forward and backward displacement of the body’s center of gravity in order to stabilize the foot and propel the limb forward. Thus, correlation of coefficient for ankle angle at sagittal plane (refer Table 3) is low and the ankle angles at sagittal plane as showed in the Fig. 8c is consistent across slow, normal and fast walking gait. Although the ankle joint is uniaxial, the axis is oblique, therefore the movement in transverse is less significant during the walking gait.

4 Conclusion

In this paper, we proposed a method for classifying walking for various speeds by extracting features based on a RNN integrated with cuDNN LSTM network. The training and testing accuracy of our proposed method were 97.5% and 97.3%, respectively. Despite the high accuracy, there are some limitations reported in this study. Firstly, only young and healthy subjects were recruited in this study, hence our kinematics findings may not be applicable to older adults. In addition, due to the stride interval varies in each subject, the average stride times of the subjects were not employed as the input of our proposed model. Such a condition might cause the sample to be nonrepresentative in the training data, thus affecting the accuracy of the model. Therefore, the average gait data of all subjects should be taken into consideration for higher accuracy and precision for future work. Besides, expanding the data can help improve the generalization and reduce the variability in the models. Noted that the study conducted by Mentiplay et al. [40] has highlighted that gait speed will affect the amplitude of spatiotemporal gait parameters, joint kinematics, joint kinetics and ground reaction forces with a decrease at slow speeds and increase at fast speed in relation to the comfortable speed. Considering that, kinetic parameters such as ground reaction force need to be employed concurrently with kinematic parameters as the determinant of gait pattern in different age populations for future study.