Keywords

1 Introduction

Gesture is always more effective way to convey ideas across human-to-human and human-to-computer when verbal language is absent [1]. People use gestures even while speaking. Conventionally humans and machines communicate using standard keys/buttons provided for users to input like in case of keyboards, mouse, joystick, etc. [1]. But these modes of inputs are inefficient and do not serve the purpose in case of people with disability. In such cases, identifying these gestures can be employed as inputs to the machines for better human-computer interaction .

Currently we have the technology to capture a human body in 3-D space. This can be done with the help of different human motion sensing devices. One example of this kind of device is the Microsoft’s Kinect sensor [2,3,4]. Complex body gestures related to different physical disorders, can be successfully identified using the Kinect. This device uses the inbuilt RGB camera and 3D depth sensor to map the human body to its skeletal form. Due to the low cost, this device is used widely in many application areas.

The body gestures taken into account for this chapter are the symptoms due to the physical disorders for muscle and joint pains shown by elderly persons. General causes of these disabilities may be from injury, fatigue, and aging. These disorders go to advanced stage due to negligence, bad habits and aging of the disabled persons. So the explained home monitoring system can be utilized as an alternative for the troublesome process of visiting hospitals on a frequent basis [1].

There are various research proposed previously for the purpose of gesture recognition in elderly healthcare domain, where Microsoft Kinect Sensor is used for gathering the gesture related information. Desk jobs demands long working hours in the same sitting posture, this results in deterioration in the functioning of tendons and joints of the persons working. In [5], the authors have touched upon a technique which will help in recognizing the symptoms of physical disorders at an early stage. Recognising these symptoms involves principal component analysing for linear dimensionality reduction and fuzzy c-means algorithm. Another work in [6] describes similar type of work for young person by calculating Euclidean distances from each frame and ReliefF algorithm is used to remove space complexity. The classification is done using fuzzy k-nearest neighbour classifier. Parajuli et al. has put forth a method for monitoring senior health using Kinect sensor [7]. The authors have approximated the gestures when elders are likely to fall by measuring gait. The recognition stage takes the help of support vector machine. In the current social model where a couple are both out working elderly healthcare is a major concern. To detect the fall of an elder, ensemble decision tree is being used along with the Kinect sensor in [8]. Yu et al. has presented an interesting approach to analyze children tantrum behaviour [9]. The paper exploits medical knowledge and questionnaire based attitude investigation. For dimensionality reduction , principal component analysis is applied and Euclidean distance is employed to estimate the proximity between behaviours like push, shout and attack. Finally k-means clustering is implemented.

In this chapter, we have explained how a real-time home-monitoring system can be build which is useful in alarming the subjects in case the muscle and joint pains are noticed using Kinect . Generally, the majority of daily life activities are performed by elder persons while sitting on a chair. Thus, fourteen distinct body gestures associated with muscle and joint pains are taken into account while the subject is sitting on a chair. After gathering the gestures of the subjects in the form of joint coordinates using Kinect sensor, those are being worked upon to extract ten features from each gesture. Next, the classification is carried out using Levenberg-Marquardt optimization based neural network . In this process, assistance can be provided to elderly people from their homes itself by monitoring their day-to-day activities. Whenever the system detects any physical disorders by examining the gestural features, an alarm is generated and the subject is recommended to do specific exercises based on the recognized disorder. But if the same disorder occurs persistently in a specific subject for a long time, the subject is advised to consult with the doctors.

In Sect. 2, a brief introduction to some preliminary ideas is provided. Sections 3 and 4 elaborates about the proposed work and experimental results respectively. Section 5 concludes the chapter following with Matlab codes in Sect. 6.

2 Preliminary Knowledge

In this section, a brief introduction to Kinect sensor, physical disorders and neural network are given.

2.1 Microsoft’s Kinect Sensor

Kinect is manufactured by Microsoft mainly for real-time gaming purposes [2,3,4]. But this sensor has tremendous scope for home monitoring. It looks like a webcam with one RGB camera, one IR (infra-red) emitter and one IR receiver (Fig. 1). Based on the data captured, x, y and depth co-ordinates are measured using RGB camera IR camera respectively (Fig. 2). Based on these two information, human body is skeletonised into twenty 3D body joints (Fig. 3). This device can record data within a distance of 1.2–3.5 m [10, 11]. The Kinect sensor can work twenty four hours in a day and in almost all lighting conditions (except for fluorescent light). Also the subject’s dress does not affect the recognition process (until he/she is wearing fully black dress).

Fig. 1
figure 1

Kinect sensor

Fig. 2
figure 2

RGB image with its corresponding body joints as recognized using Kinect sensor

Fig. 3
figure 3

Twenty body joints as captured using Kinect sensor

2.2 Considered Muscle and Joint Pains

Though the definition of old age cannot be governed by any hard boundary, but after consulting with eminent doctors, the subjects with more than 40 years age are considered as elderly people. From this age, they tend to show some symptoms related to muscle and joint pains. If the disorders can be taken care of at an early stage, the disorders cannot go to an advanced stage. For this chapter, fourteen gestures are taken into account while the subject is sitting. The descriptions about the disorders are given in Table 1.

Table 1 The addressed fourteen gestures related to elderly healthcare

2.3 Neural Network with Levenberg-Marquardt Optimization

The current chapter briefly discusses the feed forward neural network . A feed forward neural network is made up of three layers namely an input layer, few hidden layers and an output layer, and a weighted sum of input flows in forward one direction only (Fig. 4) [1]. Although a number of hidden layers can be used, one hidden layer with several neurons can fit any input-output mapping [1]. If satisfactory results are not obtained, the number of neurons present in the hidden layer may change. Best output from this method is obtained when ten neurons in one hidden layer is utilised. A mapping between input domain with output domain is attained using a random combination of weights which is used so that error between target and output is reduced. Back-propagation algorithm is another model where the weight adaption happens from the last to first layer (Fig. 5) [1]. Among different weight adaptation techniques for implementing back propagation algorithm, Levenberg-Ma rquardt optimization (LM-NN) [12, 13] has been considered as it excels in gradient descent search and conjugate gradient (or quadratic approximation) methods for medium-sized problems.

Fig. 4
figure 4

Feed forward neural network

Fig. 5
figure 5

Back propagation neural network

The drawback of gradient descent learning [13] is that it converges very slowly. This is because it does not take fixed size step towards negative gradient of the error function, but it adopts minute step size which is roughly fixed times the negative gradient.

$$ w_{i + 1} = w_{i} - \eta \,\nabla \,E\left( w \right) $$
(1)

The result of this is fast convergence in sharp neighbourhood (large gradient) and dim motion in valley neighbourhood (small gradient) on the error surface (Fig. 6). The optimization technique can be speeded up by using the curvature information. Both the gradient and curvature can be obtained from the second order information but it is costly for reckoning. Hence, most of the techniques rely on approximating the gradient by first order derivative and the curvature by function evaluation.

Fig. 6
figure 6

Convergence using gradient descent learning

The error surface is described by mean squared error function in (2) where the mean is done over the input and output pairs.

$$ E\left( w \right) = \left\langle {\left( {f\left( {x;w} \right) - y} \right)^{2} } \right\rangle $$
(2)

where x is the input vector to the neural net,

  • w is the weight matrix of the interconnections,

  • f(.) gives the output vector from the neural net,

  • y is the target vector for the neural net, and

  • E(.) is the error which is a function of weights throughout the training phase.

When E is a quadratic expression, f(.) is a linear model from which the minima can be directly evaluated without exploring the most exorbitant descent search. Thus, estimating f(.) as linear, the weight adjustment rule can be a little modified where d is a derivative and H is an estimate of Hessian matrix [13] obtained by taking mean of the first order derivative.

$$ w_{i + 1} = w_{i} - H^{ - 1} d $$
(3)

However, it may not be correct to treat E as a q all along the error surface except near the minima. So, Levenberg merged the two algorithms where at first the minimum is loosely acquired by using the gradient descent learning and then the quadratic approximation is applied to fine-tune the previous result. Here, weights and all parameters are arbitrarily initialized, and the output and respective error are evaluated. The Levenberg proposed optimization rule (4) is then applied. An enhancement in the error implies the quadratic approximation is running out and so λ is increased to implement gradient descent. Similarly, when a reduction in error means minimum is nearby and so, λ is decreased. The weight adaptation continues to iterate until error is within a dictated limit.

$$ w_{i + 1} = w_{i} - \left( {H + \lambda \,I} \right)^{ - 1} d $$
(4)

Marquadt noted that gradient descent search [13] dominates when λ is large. So, the original Levenberg equation was changed to yield the concluding Levenberg-Marquardt rule (5) where instead of the identity matrix the diagonal of the approximated Hessian matrix is used.

$$ w_{i + 1} = w_{i} - \left( {H + \lambda \,diag\left[ H \right]} \right)^{ - 1} d $$
(5)

3 Proposed Work

The block diagram of the elderly healthcare related gesture recognition scheme is shown in Fig. 7. Let, G (= 14) be the total number of physical disorders considered here. For recognition of an unknown gesture u, a training dataset is constructed by taking gestural data from K different subjects of a specific disorder g, where g \( \in \)[1, G]. For each kth gesture for k \( \in \)[1, K], F number of total features (α) are extracted. The meaning of fth feature for f \( \in \)[1, F] is given in Fig. 8, 9. As the Kinect sensor is able to capture the human body using twenty 3D joint co-ordinates, using this as the raw information, features are mined which can be used to distinguish between a normal gesture and a gesture revealing physical disorder. To construct the feature vector, ten features are extracted as given in Fig. 8, where α can be D and A implying denotes Euclidean distance and angle features respectively (Fig. 9). The meanings of the joint names are already provided in Fig. 3. Hence, a 1 × F (= 10) feature vector is prepared for every frame captured using Kinect sensor.

Fig. 7
figure 7

Block diagram of the described system

Fig. 8
figure 8

Feature vector

Fig. 9
figure 9

Features extracted: (a) distance and (b) angle

Figure 7 also pictorially depicts the philosophy of identifying an unknown gesture, which is passed through the feature extraction step. The testing gesture is then recognized with the help of neural network , already explained in Sect. 2.3.

4 Experimental Results

Three datasets have been prepared for this work, with thirty subjects (K = 30) in each dataset as provided in Table 2. The feature vector corresponding to a gesture of every particular disease is enlisted in each of the columns of Table 5.1. The numbering of the gestures in Table 5.1 is same as that in Fig. 5.1. Table 3.

Table 2 Preparation of training dataset
Table 3 Sample feature vectors from training dataset 1

The explained work is compared with four other well-known techniques for the performance analysis. The other existing algorithms are ensemble decision tree (EDT) [8], type-1 fuzzy classifier (T1FS) [14], support vector machine (SVM) [7] and k-nearest neighbour (kNN) [15]. EDT classifier is based on adaptive boosting principle by taking maximum iterations as 100. T1FS algorithm measures the support of the feature vector based on Gaussian membership curves. SVM algorithm uses a radial basis function kernel whose kernel parameter has a value 1 and the cost value of 100 is tuned in the classifier. For kNN, the value of k is taken as 5 and Euclidean distance based similarity measure with majority voting determines the class of the unknown gesture.

All the stated algorithms are multiclass in nature except for SVM, which is innately binary. The performance analysis is carried out based on positive predicted value (PPV), negative predicted value (NPV), sensitivity, specificity, accuracy, average error rate (AER) and F1 score (F1 S) as given in (6–12). Here, TP, TN, FP and FN stand for true positive, true negative, false positive and false negative respectively. The comparison for each training dataset for all the performance metrics are given in Figs. 10, 11, 12 From the three figures, it is evident that LMA-NN is the best choice for physical disorder recognition for elderly healthcare .

Fig. 10
figure 10

Comparison of five algorithms for elderly healthcare for training dataset 1

Fig. 11
figure 11

Comparison of five algorithms for elderly healthcare for training dataset 2

Fig. 12
figure 12

Comparison of five algorithms for elderly healthcare for training dataset 3

$$ {\text{PPV}} = \frac{TP}{TP + FP} $$
(6)
$$ {\text{NPV}} = \frac{TN}{TN + FN} $$
(7)
$$ {\text{Sensitivity}} = \frac{TP}{TP + FN} $$
(8)
$$ {\text{Specificity}} = \frac{TN}{TN + FP} $$
(9)
$$ {\text{Accuracy}} = \frac{TP + TN}{TP + TN + FP + FN} $$
(10)
$$ {\text{AER}} = \frac{FP + FN}{TP + TN + FP + FN} $$
(11)
$$ {\text{F1}}\,{\text{S}} = 2 \times \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(12)

To statistically validate the work using neural network for elderly healthcare , three tests are considered. The first one is McNemar’s Test. Here, P and Q are the two competitor algorithms with same training dataset. Again, let n01 is the number of cases wrongly classified by P but not by Q, and n10 is the number of cases wrongly classified by Q but not by P. So according to the null hypothesis considering both classifiers have the same error rate, the McNemar’s statistic Z obeys a χ2 with 1 degree of freedom [16].

$$ Z = \frac{{(\left| {n_{01} } \right.\left. { - n_{10} } \right| - 1)^{2} }}{{n_{01} + n_{10} }} $$
(13)

From Table 4, it can be observed that the null hypothesis is rejected where Z > 3.84, as 3.84 is the threshold value of the chi square distribution at probability of 0.05.

Table 4 Performance analysis using McNemar’s test

The next statistical analysis is carried out using Friedman Test. Here, let r b a is the ranking of the Accuracy obtained by the ath algorithm (1 ≤ a ≤ A) for the bth dataset (1 ≤ b ≤ B). The best and worst of all classifiers is given ranks of 1 and B respectively. Table 5 gives the idea about Friedman rankings [17].

Table 5 Performance analysis using Friedman and Iman-Davenport tests
$$ R_{a} = \frac{1}{B}\sum\limits_{b = 1}^{B} {r_{a}^{b} } $$
(14)
$$ \chi^{2} = \frac{12B}{A(A + 1)}\left[ {\sum\limits_{a = 1}^{A} {R_{a}^{2} } - \frac{{A(A + 1)^{2} }}{4}} \right] $$
(15)

For the current work, B = 3 and A = 5. In Table 5, the null hypothesis is rejected, as \( \chi_{F}^{2} = 9.71 \) is greater than the threshold value (i.e., 9.49) of the χ2 distribution for A −1 = 4 degrees of freedom at probability of 0.05 [18].

The last analysis is using Iman-Davenport Statistical Test. It is based on F distribution with (A − 1) and (A − 1) × (B − 1) degrees of freedom [17].

$$ F = \frac{{(B - 1) \times \chi^{2} }}{{B \times (A - 1) - \chi^{2} }} $$
(16)

It is evident that the null hypothesis is rejected, as F = 8.50 is greater than the critical value (i.e., 5.05) of the F distribution for A −1 = 4 and (A − 1) × (B − 1) = 8 degrees of freedom at probability of 0.05 [18].

5 Conclusion and Future Work

The elderly healthcare system for home monitoring is quite a novel and user-friendly method of recognizing physical disorders related to joint and muscle pains. The work is carried out after consulting with several doctors for preparation of the datasets. There is a closed loop between the subject and the Kinect sensor interfacing computer. Whenever any ambiguity is detected in the normal day-to-day life of the subject, alarm is generated. If that physical disorder is continued for several hours in a day, specific exercise videos are shown to the subjects.

Though the work is mainly demonstrated for elderly people, but it is equally important for young individuals working in multi-national companies. Due to the sedentary working environments in the offices, certain muscle and joint fatigues are developed in the employees. If early detection of those physical disorders can be done, then it will be beneficial for the employees and in turn total company health will be improved. As Kinect sensor only detects the skeleton of the subject, thus privacies of the subjects are persevered. The work can be utilised for other areas, like e-learning of several dances and sign languages and also training in several sports.

Kinect sensor does not require refresh time and can run throughout a day in most lighting conditions. But only disadvantage of using Kinect sensor is that its limited range as it uses the IR. In the future, we should delve into several other data acquisition techniques that can subdue the limitations stated above with introducing new gestures covering more physical disorders.

6 Matlab Codes

The input file ‘video_1.txt’ contains the twenty body joints 3D information. Here the feature extraction procedure is demonstrated using following Matlab code.

Sample Run: The Matlab code creates the skeletal image and feature vector as given in Fig. 3 and Fig. 8 correspondingly.