Keywords

1 Introduction

Human activity recognition is one of the most interesting research topics for several areas such as pervasive and mobile computing, ambient assisted living, surveillance-based security, sport and fitness activities, healthcare.

Over the recent years sensor technologies especially high-capacity, low-power, low-cost, miniaturized and lightweight sensors, wired, wireless and hybrid communication network protocols as well as signal processing theory have greatly progressed.

Wearable sensors, i.e. sensors that are positioned directly or indirectly on the human body, generate signals (accelerometric, PPG, ECG, sEMG, ...) when the user performs activities. Therefore they can monitor features that are descriptive of the person’s physiological state or movement.

These sensors can be embedded into clothes, shoes, belts, sunglasses, smartwatches and smartphones, or positioned directly on the body and can be used to collect information such as body position and movement, heart rate, muscle fatigue and skin temperature [3, 4].

Among wearable sensors, accelerometers are probably the most frequently used for activity monitoring. In particular, they are effective in monitoring actions that involve repetitive body motions, such as walking, running, cycling, sitting, standing, and climbing stairs.

On the one hand, activity classification using accelerometers can be obtained using one or more sensors on the body [7, 12]. However single sensor systems are more practical and in this case common location choices are the waist, upper arm, wrist, and ankle [13, 15, 16]. The waist location has been used extensively in physical activity measurements because it captures major body motions, but algorithms using waist data can underestimate overall expenditure on activities such as bicycling or arm ergometry, where the waist movement is uncorrelated to the movement of the limbs. Therefore several recent studies addressed the problem of detecting human activities from smartphones and smartwatches [1, 8, 11].

On the other hand, accurate estimation of biometric parameters recorded from subjects’ wrist or waist, when the subjects are performing various physical exercises, is often a challenging problem due to the presence of motion artifacts. In order to reduce the motion artifacts, data derived from a triaxial accelerometer have been proven to be very useful [6].

Wrist-worn sensor devices can be comfortably used during activities of daily living, including sleep, and can remain active during changing of clothes and do not require special belts or clips, thus increasing the wear time. Therefore smartwatches for human activity monitoring are becoming very important tools in personal health monitoring. In particular, exercise routines and repetitions can be counted in order to track a workout routine as well as determine the energy expenditure of individual movements. Indeed, mobile fitness coaching has involved topics ranging from quality of performing such sports actions to detection of the specific sports activity [5].

However, wearable devices such as smartphones and smartwatches are in general differently oriented during human activities, so the data derived from the three axes are mixed up.

This paper proposes an efficient technique for real-time recognition of human activities, and transitions between them, by using accelerometer data. The proposed technique is based on singular value decomposition (SVD) and truncated Karhunen-Loève transform (KLT) for feature extraction and reduction, and Bayesian classification for class recognition. The algorithm is independent of the orientation of the sensor making it particularly suitable for implementation in wearable devices such as smartphones where the orientation of the sensor can be unknown or its placement could be not always correct. In order to demonstrate the validity of this technique, it has been successfully applied to a database of accelerometer data derived from static postures, dynamic activities, and postural transitions occurring between the static postures.

The paper is organized as follows. Section 2 provides a brief overview of the human activity classification algorithm. Section 3 presents the experimental results carried out on a public domain data set in order to show the effectiveness of the proposed approach. Finally Sect. 4 summarizes the conclusions of the present work.

2 Recognition Algorithm

This section presents a description of the overall algorithm, based on the dimensionality-reduced singular value spectrum of the data Hankel matrix and on the Bayesian classifier, that is able to identify human activity classes from accelerometer data.

A schematic diagram of the activity detection algorithm, is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of the proposed framework for human activity classification (x, y, z are the 3-axial accelerometer signals)

2.1 Data Preprocessing and Feature Extraction

Let x, y, z be the accelerometer signals. After a preprocessing stage where the raw data have been windowed into windows \(N+L-1\) samples long, the resulting accelerometer signals have been manipulated as follows.

Let \(x_t = [ x(t) \ldots x(t+N-1) ]^T\), \({x_t^{(i)}} = x_{t+i-1}\), and \(X_t = [ {{x_t^{(1)}} \ldots {x_t^{(L)}}} ] \). Analogously, let \(Y_t = [ {{y_t^{(1)}} \ldots {y_t^{(L)}}} ]\) and \(Z_t = [ {{z_t^{(1)}} \ldots {z_t^{(L)}}} ]\). The matrices \(X_t\), \(Y_t\), and \(Z_t\) so built are the Hankel data matrices of the three accelerometer signals, where \({x_t^{(i)}} ,\; {y_t^{(i)}},\; {z_t^{(i)}}\), \(i = 1, \ldots , L\), represent the observations achieved from the three-axes accelerometer, each shifted in time by i samples.

The complete matrix of sample signals

$$\begin{aligned} H_t = [X_t\,Y_t\,Z_t] \in {\mathbb {R}^{N \times 3L}} \end{aligned}$$
(1)

can be represented by the singular value decomposition (SVD) as

$$\begin{aligned} H_t = S_t \varLambda _t {R_t^T} = \mathop \sum \limits _{i = 1}^N {\lambda _i}{s_i}r_i^T, \end{aligned}$$
(2)

where, if \(N<3L\), \(S_t = \left[ {{s_1} \ldots {s_N}} \right] \), \(R_t = \left[ {{r_1} \ldots {r_N}} \right] \), with \(s_i\), \(r_i\) being the corresponding left and right singular vectors, and \(\lambda _i\) are the singular values in decreasing order \({\lambda _1} \ge {\lambda _2} \ge \cdots \ge {\lambda _N}\).

By denoting with \(H_t \in \mathbb {R}^{N \times 3L}\) the data matrix of the accelerometer signals at each time instant t, in order to apply the human activity classification algorithm, a feature vector \({\mathrm{}{\xi }}_t\) has to be derived from this matrix.

We noticed that different types of activities lead to different distributions of the energy of the accelerometer signals among its eigenvectors. Thus, a suitable candidate for identifying the type of activity is the normalized spectrum of singular values \(\varLambda _t = [ \lambda _1 \dots \lambda _N ]\), so as to avoid dependence on the intensity of the activity. Therefore we choose \({\mathrm{}{\xi }}_t = \varLambda _t/||\varLambda _t||\) where \(|| \cdot ||\) represents the norm of a vector.

In order to face the problem of dimensionality, the usual choice [10] is to reduce the vector \({\mathrm{}{\xi }}_t\) to a vector \({\mathrm{}{k}}_{tM}\) of lower dimension by a linear non-invertible transform \(\mathrm{}{\Psi }\) (a rectangular matrix) such that

$$\begin{aligned} {\mathrm{}{k}}_{tM}= \mathrm{}{\Psi } \; {\mathrm{}{\xi }}_t \;, \end{aligned}$$
(3)

where \(\mathrm{}{\xi }_t \in \mathbb {R}^N\), \({\mathrm{}{k}}_{tM}\in \mathbb {R}^M\), \(\mathrm{}{\Psi } \in \mathbb {R}^{M \times N}\), and \(M \ll N\).

It is well known that, among the allowable linear transforms \(\mathrm{}{\Psi } : \mathbb {R}^N \rightarrow \mathbb {R}^M\), the Karhunen-Loève transform truncated to \(M < N\) orthonormal basis functions, is the one that ensures the minimum mean square error.

This normalized singular value spectrum can easily be computed immediately after having performed the SVD on the accelerometer signals, and used as input to the Bayesian classifier after a KLT-based dimensionality reduction from \(N=96\) to \(M=10\).

The algorithm developed in this section follows the approach reported in [5] as it was successfully adopted in the physical exercise identification for photoplethysmography artifact reduction.

2.2 Bayesian Classification

Let us refer to a frame \({\mathrm{}{k}}_{tM}[n]\), \(n = 0, \dots , M - 1\), containing features extracted from the accelerometer signals.

We assume that the observations for all human activities that need to be identified, are acquired and divided in two sets, \(\mathcal {W}\) for training and \(\mathcal {Z}\) for testing.

For Bayesian classification, a group of \(\varGamma \) activities is represented by the probability density functions (pdfs) \(p_{\gamma }({\mathrm{}{k}}_{tM} ) = p({\mathrm{}{k}}_{tM} \;|\; \theta _{\gamma })\), \(\gamma = 1,2,\ldots , \varGamma \), where \(\theta _{\gamma }\) are the parameters to be estimated during training. Thus we can define the vector \({\mathrm{}{p}} =[ p_{1}({\mathrm{}{k}}_{tM} ) , \ldots , p_{\varGamma }({\mathrm{}{k}}_{tM} ) ]^T\).

The objective of classification is to find the model \(\theta _{\gamma }\) corresponding to the activity \(\gamma \) which has the maximum a posteriori probability for a given frame \({\mathrm{}{k}}_{tM} \in \mathcal {Z}\). Formally:

$$\begin{aligned} \widehat{\gamma }( {\mathrm{}{k}}_{tM} ) = \mathop {\text {argmax}}_{1 \le \gamma \le \varGamma } \left\{ p(\theta _{\gamma } \;|\; {\mathrm{}{k}}_{tM}) \right\} = \mathop {\text {argmax}}_{1 \le \gamma \le \varGamma } \left\{ \frac{p( {\mathrm{}{k}}_{tM} \;|\; \theta _{\gamma })p(\theta _{\gamma }) }{ p( {\mathrm{}{k}}_{tM} ) } \right\} \;. \end{aligned}$$
(4)

Assuming equally likely activities (i.e. \(p(\theta _{\gamma }) = 1/\varGamma \) ) and noting that \(p({\mathrm{}{k}}_{tM} )\) is the same for all activity models, the Bayesian classification is equivalent to

$$\begin{aligned} \widehat{\gamma }({\mathrm{}{k}}_{tM}) = \mathop {\text {argmax}}_{1 \le \gamma \le \varGamma } \left\{ p_{\gamma } ({\mathrm{}{k}}_{tM} ) \right\} \;. \end{aligned}$$
(5)

Thus Bayesian identification reduces to solving the problem stated by (5).

The most generic statistical model one can adopt for \(p({\mathrm{}{k}}_{tM} \;|\; \theta _\gamma )\) is the Gaussian mixture model (GMM) [14]. The GMM for the single exercise is a weighted sum of F components densities and given by the equation

$$\begin{aligned} p( {\mathrm{}{k}}_{tM} \;|\; \theta _{\gamma }) \,=\,\sum _{i=1}^F\alpha _i\, \mathcal {N}( {\mathrm{}{k}}_{tM} \;|\; \mathrm{}{\mu }_i,\mathrm{}{C}_i) \end{aligned}$$
(6)

where \(\alpha _i\), \(i=1,\ldots ,F\) are the mixing weights, and \(\mathcal {N}({\mathrm{}{k}}_{tM}\;|\; \mathrm{}{\mu }_i,\mathrm{}{C}_i)\) represents the density of a Gaussian distribution with mean \(\mathrm{}{\mu }_i\) and covariance matrix \(\mathrm{}{C}_i\). It is worth noting that \(\alpha _i\) must satisfy \(0\le \alpha _i\le 1\) and \(\sum _{i=1}^F\alpha _i=1\) and \(\theta _{\gamma }\) is the set of parameters needed to specify the Gaussian mixture, defined as \( \theta _{\gamma } = \{ \mathrm{}{\alpha }_1, \mathrm{}{\mu }_1, \mathrm{}{C}_1, \ldots , \mathrm{}{\alpha }_F, \mathrm{}{\mu }_F, \mathrm{}{C}_F \}\).

The choice for obtaining an estimate of the mixture parameters is an unsupervised algorithm for learning a finite mixture model from multivariate data, that overcomes the main lacks of the standard expectation maximization (EM) algorithm, i.e. sensitiveness to initialization and selection of number F of components [9]. This algorithm integrates both model estimation and component selection, i.e. the ability of choosing the best number of mixture components F according to a predefined minimization criterion, in a single framework.

3 Experimental Results

We used the “Human Activity Recognition Using Smartphones” data set [2]. This data set includes data recorded from experiments made by a group of 30 volunteers, each of which performed six different activities, three belonging to the “static” class, i.e., standing, sitting, and lying, and three belonging to the “cyclic” class, i.e., walking, climbing stairs, descending stairs. Data recorded during the transitions occurring between static postures were labeled accordingly as “transitions”. 3-axial linear acceleration was recorded at a 50 Hz sampling rate and the experiments were video-recorded to allow accurate manual data labeling.

Table 1 Confusion matrix of the exercise type identifier evaluated on the whole testing set
Table 2 Performance (sensitivity, precision, and F1-score) of the exercise type identifier evaluated on the whole testing set
Fig. 2
figure 2

Eigenvector projections

The signals so gathered were split in 2.56 s long windows with 50 % overlap. Windows containing unlabeled portions of signal had been discarded, as were windows containing more than 25 % of signal with inconsistent labeling (i.e., a label that differs from that of the majority of the data points within the window). This yielded a total of 10991 windows to process, 7808 of which were used for training and the remaining 3183 for testing. Separation was done so that data recorded by any given person never occurred both in the training and testing sets.

Data was pre-processed by using these windows to build three \(N\times L\) Hankel matrices, one for each acceleration direction, with \(N=96\) and \(L=33\). These are then fed together to the SVD, so as to remove the effect of sensor orientation, and all the ensuing normalized singular values used, after dimensionality reduction to \(M=10\) principal components, as the feature vectors for the classifier.

As a first experiment, the performance of the classifier in recognizing the exact activity being performed was assessed. Of course, this method was never intended to be able to discriminate between all these activities, as by mixing the axes of the accelerometer output is clearly nigh impossible to discriminate, e.g., between the static postures. This is clearly shown if Table 1, where the confusion matrix of this classification experiment is reported. The resulting performance is reported in Table 2, yielding an overall accuracy of 52.15 % with an F1-score of 43.22 %.

Despite these results, it is quite clear from observation of Table 1 that the recognizer very seldom makes mistakes between the three classes of exercises (static postures, cyclic movements, transitions). A scatter plot of the first two features, shown in Fig. 2, also confirms this idea, as in just two dimensions the three classes are very well separated.

This is better assessed by the second experiment, where only these three classes were considered for training the models. The confusion matrix is shown in Table 3, and the performance in Table 4. The overall accuracy in this context is 98.65 % and the F1-score is 96.48 %, denoting the high reliability of the chosen features to discriminate between these classes. It should also be recalled that a number of windows, 5.8 % to be exact, contain up to 25 % of “noise”, i.e., data belonging to different classes, so some classification error is to be expected.

Table 3 Confusion matrix of the human activity class identifier evaluated on the whole testing set
Table 4 Performance (sensitivity, precision, and F1-score) of the activity class identifier evaluated on the whole testing set

4 Conclusion

In this paper we present a feature set, based on the dimensionality-reduced singular value spectrum of the data Hankel matrix, that is suitable to identify human activity classes from accelerometer data. Since the singular value spectrum is inherently invariant with respect to rotation matrices, classification accuracy is independent of the orientation of the sensor, freeing the user from having to worry about its correct placement.

Experimental results conducted on a public domain data set show the effectiveness of the proposed approach.