1 Introduction

Systems that enable human–machine interactions and spatial modeling have a wide range of applications in modern engineering, robotic and biomedical devices [1].

While complex synchronized video camera systems [57] represent precise but expensive technical solutions, it is possible to use much cheaper systems that employ depth sensors to acquire data with sufficient accuracy for many applications. The Microsoft (MS) Kinect™ [24, 8] allows the recording of such data sets via its image and depth sensors (illustrated in Fig. 1a) and the transfer of these data to appropriate mathematical environments, such as MATLAB, for further processing. The acquired data sets can then be used to propose methods and algorithms for movement analyses, scene modeling [9], gesture and body recognition [1012], rehabilitation [13] and posture reconstruction [1416]. These new devices, combined with motion sensors and specific control units, are also often used in robotic systems control [11, 1719].

This article is devoted to the use of the MS Kinect system for movement data acquisition, detection of gait features and analysis of gait disorders [2024] via selected digital signal and image processing methods. The proposed graphical user interface was used to acquire data in a clinical environment from patients with Parkinson’s disease [2527] and from healthy individuals to form a reference dataset. Specific algorithms were then designed and used for motion tracking, gait feature evaluation and the classification of the observed sets of individuals. The obtained results were evaluated from both the engineering and neurological perspectives.

The proposed methods show how modern sensors can be used to acquire video frames of matrices that enable human–machine interaction. The application discussed here is related to the analysis of gait disorders but could be further extended to other areas, including rehabilitation engineering and robotic systems control.

2 Methods

2.1 Data acquisition

The information related to the body motions of the selected individuals was recorded with the MS Kinect sensors, which are illustrated in Fig. 1a. The RGB camera in the middle of the device recorded video image frames with a resolution of 480 × 640 pixels and a frequency of 30 frames/s. The depth sensor consists of an infrared projector (on the left) and an infrared camera (on the right) that use the structured light principle [28, 29] to detect the distances of image pixels with a precision of 4–40 mm depending upon the distance from the sensor. The resulting matrix has a size of 480 × 640.

Fig. 1
figure 1

An example frame recorded by the MS Kinect including: a the MS Kinect’s RGB camera for image video recording and depth sensors, which includes a projector and a receiver, for the acquisition of depth frame matrices, b the image frame matrix combined with the skeleton estimate and c the contour plot of the depth frame matrix with distances from the selected plane

Figure 1b, c present portions of selected frames that were recorded by the image and depth sensors. The selected image presented in Fig. 1b was combined with the skeleton projection and the estimated positions of the joints. Figure 1c illustrates information from the depth sensor. The contour plot in Fig. 1c presents the distances of the individual pixels from a selected plane that was at a distance of 2,200 mm from the MS Kinect. Structure of the whole system for data acquisition is presented in Fig. 2.

Fig. 2
figure 2

The coordinate system for the MS Kinect data acquisition and distribution of image and depth sensors errors based on differences of subsequent submatrices in their static regions for selected data frames

The proposed graphical user interface (GUI) was used to record MS Kinect data from the observed individuals and to process the information in the MATLAB (version 2014b) environment [30]. The GUI was designed to allow the simple recording of video frames in clinical environments via the following steps:

  1. 1.

    the recording of the name and surname of the patient;

  2. 2.

    an initialization of the MS Kinect system (connected through a USB); and

  3. 3.

    the recording or interruption of data acquisition.

Additional functions of the graphical user interface included the selection of further parameters for recording and the previewing of data sets including the options of previewing image and depth sensor data from the database. The skeleton tracking algorithm, which processes the data, also provides information about the locations of joints as specified in Table 1. The joint numbering and the connection map are presented in Table 1 as well.

Both the RGB camera and the depth sensors store information in 640 × 480 element matrices according to the schematic diagram of the system presented in Fig. 2. A histogram of differences in static portions of consecutive image frames illustrates the accuracy of the MS Kinect image sensor. Figure 2 also presents a similar histogram of these differences for the static portions of the matrices that include the depth sensor data. The accuracy of the system is fundamental for spatial data modeling [3134] and, as expected, was in the range of −50 to 50 mm.

The experimental portion of this study was devoted to the analyses of the gaits of the following two sets of individuals: (1) 18 patients with Parkinson’s disease (PD) and (2) 18 healthy age-matched individuals (Norm). The MS Kinect that was used for data acquisition was installed approximately 60 cm above the floor. Each individual repeated a straight walk of approximately 4 m (five steps) back and forth 5 times. Each video record acquired with the sampling rate of 30 frames/s contained useful information about the direct walks but also unnecessary frames recorded during the turns.

2.2 Skeleton tracking and stride length estimation

The skeleton tracking algorithm processed data matrices from the image and depth sensors. The algorithm also provided coordinates that specified the spatial locations of all joints in the selected coordinate system, as illustrated in Fig. 2, by utilizing the joint numbering and connection maps defined in Table 1.

Table 1 Skeleton positions and connection map of an individual used for data acquisition and processing of video records

The steps of proposed algorithm for gait feature detection that utilized the MS Kinect can be summarized as follows:

  1. 1.

    a preprocessing of the skeleton data to remove gross errors and finite impulse response (FIR) filtering of the joint positions to minimize observation errors;

  2. 2.

    the rejection of frames with substantial errors based on the temporal evolution of the centers of mass (COM) as evaluated based on joints 1, 2 and 3 (i.e., the hip-center, spine, and shoulder-center joints, respectively) and presented in Fig. 3a, b;

  3. 3.

    the extraction of the gait segments in one direction and the rejection of distorted frames;

  4. 4.

    the evaluation of the positions of the leg centers from joints 15, 16 and 19, 20 (i.e., the ankle and foot joints of the left and right legs, respectively) in each segment (Fig. 3a) and estimation of the average step length via the evaluation of their Euclidian distances (Fig. 3c) and the detection of their maxima;

  5. 5.

    the evaluation of the lengths of the legs of each individual based on the spatial positions of joints 13, 14, 15 and 17, 18, 19 (i.e., the hip, knee and ankle joints of the left and right legs, respectively) and with the results presented in Fig. 4a, the averaging for each individual and normalization of the step lengths; and

  6. 6.

    the estimation of the stride features via the averaging of the normalized step lengths for each individual in each segment of a straight walk.

Further stride features, including walking speed, can be detected in a similar manner using data from the image and depth sensors of the MS Kinect.

Fig. 3
figure 3

Detection of the right and left legs movement illustrating: a temporal evolution of the centers of mass and leg centers in the three-dimensional space, b temporal evolution of the centers of mass positions above the horizontal plane and c the distances between the leg centers for a selected walk segment

3 Results

Table 2 presents descriptions of the data sets from the 18 individuals (12 men and 6 women) with the Parkinson’s disease and the 18 controls (7 men and 11 women). All patients met UK Parkinson’s Disease Society brain bank clinical diagnostic criteria, and they were monitored in the movement disorder unit.

Using the proposed algorithm after the reduction in the observation errors, the numerical results obtained from the data (acquired with the MS Kinect at a sampling rate of 30 frames per second) are presented in Table 2. The resulting average stride lengths (SLs) suggest that this system could be used to classify individuals with Parkinson’s disease (SL = 0.38 m, SD = 0.06) using the age-matched individuals (SL = 0.53 m, SD = 0.05) as a reference set. As expected the average stride length was shorter for the PD group than for the reference set.

Table 2 Stride length results for the two sets of individuals (PD: Parkinson’s disease, Norm: control) with the standard deviations (SD)

The leg lengths were analyzed separately to normalize the stride lengths of the specific individuals and to increase the reliability of the gait features. The joint positions in three-dimensional space were estimated with the MS Kinect system and used to evaluate the Euclidean distances between the hip and knee and the knee and ankle. The sums of these values for each leg were used to estimate leg lengths. The average difference in the lengths of the legs of each of the 36 individuals was 11 mm with a standard deviation of 8 mm, which was within a normal range according to a long history of clinical observations. The average values of the leg lengths of each individual were calculated (across all subjects, the average leg length was 0.784 m with a standard deviation of 0.011 m) and used for the stride length normalization. Figure 4a presents these results for the reference set of subjects.

Fig. 4
figure 4

Selected features presenting: a individual leg lengths evaluated from MS Kinect joints detection of the reference set and age dependence of stride lengths for b the individuals with Parkinson’s disease (PD set) and c the age-matched controls (reference set)

Figure 4b, c present the age-dependent stride lengths for the individuals with Parkinson’s disease (positive set) and the age-matched controls (negative set). While for the positive set, it is possible to observe the decrease in the stride lengths with the regression coefficient RC = −0.0082 m/year, no such dependence exists for the reference set. For this reason, no normalization to the age of individuals was performed in this study.

Figure 5 presents the distributions of stride lengths for the individuals with Parkinson’s disease (positive set) and the age-matched controls (negative set), and these data correspond to those in Table 2.

A more detailed analysis of the data involving sensitivity and specificity [35] is presented in Fig. 5. As shown in Fig. 5a, it was possible to determine the true negative (TN) and false positive (FP) rates in the negative set (i.e., the controls) and the true positive (TP) and false negative (FN) rates in the positive set (i.e., the patients with Parkinson’s disease) for any given threshold step length value. Next, the accuracy (ACC), sensitivity (SE) and specificity (SP) were evaluated according to the following relations:

$$\begin{aligned} \hbox {ACC}&= \frac{\hbox {TN}+\hbox {TP}}{\hbox {TN}+\hbox {FP}+\hbox {TP}+\hbox {FN}}, \end{aligned}$$
(1)
$$\begin{aligned} \hbox {SE}&= \frac{\hbox {TP}}{\hbox {TP}+\hbox {FN}}, \end{aligned}$$
(2)
$$\begin{aligned} \hbox {SP}&= \frac{\hbox {TN}}{\hbox {TN}+\hbox {FP}} \end{aligned}$$
(3)

with respect to the selected criterion (step length) value. The resulting plot in Fig. 5b shows that it was possible to achieve an accuracy of 91.7 % at the optimal step length threshold of 0.47 m.

Fig. 5
figure 5

Stride length analysis for the individuals with Parkinson’s disease (positive set) and the age-matched controls (negative set) presenting: a the distributions of true and false results across criterion values, b the accuracy achieved with the optimal criterion value and c the sensitivity/specificity plots

Figure 6 presents the average stride lengths (with standard deviations) of the individuals in the positive and negative sets and the results of the accuracy analyses. These results suggest that stride length could be a useful feature for the classification of individuals into Parkinson’s disease and non-Parkinson’s disease groups.

Fig. 6
figure 6

The average stride lengths (with standard deviations) of a the patients with Parkinson’s disease and b the reference set of individuals

4 Conclusion

Human–machine interaction and computer intelligence are key components of the rapidly developing interdisciplinary field that combines sensor technology, data fusion, computer vision, image processing, control engineering and robotics. Numerous papers have been devoted to the identification and detection of motion features [3638] with applications in biomedical signal processing and the diagnoses of gait disorders [39, 40]. The latest research [41, 42] related to wearable and non-wearable systems indicate the increasing interest in portable systems and specific body sensors for gait analysis.

This motion analysis and Parkinson’s disease recognition can be performed by specialized and expensive camera systems with specific sensors. These systems are commonly used for the detections of movement with high accuracy. This paper presents a new approach to the analysis of gait disorders that utilizes the relatively inexpensive MS Kinect. The MS Kinect has a depth sensor accuracy of 4–40 mm, which is sufficient for many applications. The results obtained suggest that the MS Kinect can be used for the detection of gait disorders and for the recognition of Parkinson’s disease. The maximum accuracy observed in the present study was 91.7 %.

Further work will be devoted to the study of more extensive data sets and the evaluation of a higher number of parameters to increase the classification accuracy of motion features. We assume that combination of data from an increased number of biosensors will produce pattern matrices that can be used for more accurate classifications across a wide range of criteria values and provide tools for remote diagnostics and wireless data processing.