1 Introduction

Research on human posture recognition has made great progress in recent years having been applied to a wide range of applications, for example surveillance, healthcare, robotics, smart homes, smart cities and human computer interaction [1,2,3,4]. Posture recognition aims to automatically classify the physical status of the subject, so as to determine whether the user requires assistance or guidance whenever an abnormal activity is detected. To date, most, posture recognition systems operate by collecting sequential data obtained from cameras or wearable inertial sensors.

In a crowd environment, most existing human activity detection systems have been based on vision technologies [5]. Although vision-based systems have advantages (e.g. are easy to operate and don’t require collaboration with those being monitored), they incur various challenges such as the camera position, background clutter, limited coverage, etc. Wearable sensor technologies provide a way of overcoming some of these shortcomings while enabling long-term recording. Wearable inertial sensors are often embedded into various wearable items or devices such as wristbands, shoes, smart phones, smart watches, and clothes [6, 7]. Wearable inertial sensors have a long history going back to the advent of microprocessors where, for example, they were used to create a guidance systems for poorly sighted people [8] to more recent times where advances in technology has made them more feasible and culturally acceptable, since they are unobtrusive, light-weight, low-cost and power-efficient mobile devices [9]. However, they have their own imitations, such as sensor offset, drift, and sensitivity to location on the body.

This study was aimed at designing and developing a real-time monitoring system that was suitable for potential security or health incidents. For example, it might be worn by hospital patients, care home residents, police personnel, night-club bouncers or event-stewards (e.g. sports or concert events at a stadium) to summon help should they fall or be knocked to the ground. In order to explore the feasibility of developing such a system, this work sought to identify suitable data pre-processing techniques and model selection approaches. Thus, the main contributions of the paper are:

  • The design of an efficient accelerometer calibration algorithm for improving the sensor precision, and reducing its offset and drift.

  • The design of a reliable adaptive signal segmentation (ASS) algorithm for posture-based adaptive boundary point detection.

  • The development of an adaptive hybrid classifier which combines the above ASS algorithm with an MLP classifier incorporating a plurality voting approach which was evaluated using a real-time posture recognition framework based on simulated crowd security incident scenarios.

The rest of the paper is organized as follows. Section 2 reviews literature relating to data pre-processing and posture recognition algorithms. Section 3 explains the methodologies adopted for sensor calibration and hybrid model design in our prototype, along with an overview of the system architecture. Section 4 describes the experimental setup, procedures and results. Section 5 contains the conclusions and our thoughts on future research directions for this work.

2 Related work

Researchers have proposed various daily activity recognition systems using different methods and based on different sensors. For example, Cheok et al. [10] has provided a review of hand-gesture recognition techniques that split the process into different stages: data acquisition, pre-processing, feature extraction, signal segmentation and classification.

Data that is acquired from sensors usually contains errors which arise from various sources including, for example, an improper zero reference. Therefore it is essential to pre-process the data (i.e. sensor calibration), by manufacturer or user, to negate the effects of these errors. Bonnet et al. [11] proposed a unified calibration framework to determine inertial sensor calibration parameters such as sensor sensitivities, offsets, misalignment angles, and mounting frame rotation matrix. Feature extraction and feature selection aims to maximize the classification accuracy while minimize the number of features [12]. In this paper we proposed a new and efficient sensor calibration method that is performed based-only on six stationary positions.

The purpose of signal segmentation is to divide a signal into several epochs with the same statistical characteristics such as amplitude or frequency. Vari [13] introduced a Modified Varri method for signal segmentation, which includes three parameters affecting the accuracy. These parameters must be determined experimentally, meaning they may not be optimal for any arbitrary signal segmentation application. Azami et al. [14] designed a genetic algorithm (GA) as a powerful search tool to look for appropriate parameters based on the Modified Varri approach. Later, Azami et al. [15] proposed an approach where the signal was initially filtered by a Moving Average or a Savitzky-Golar filter to reduce short-term signal noise aimed at improving the reliability of the method. Novosadová et al. [16] described a polynomial model for signal segmentation that assumes every segment is a polynomial of certain degree. Segment borders correspond to positons in the signal where the model changes its polynomial representation. They also demonstrated that using orthogonal polynomials, instead of the standard polynomial in the model, is beneficial when the segmenting signals are corrupted by noise. Laguna et al. [17] demonstrated a dynamic window method based on events for the signal segmentation. Their approach adjusts dynamically the window size and the shift for each step. This means a new window is generated only when a new event is detected in contrast to the fixed-size sliding window approach. In this paper, we have proposed a hybrid ASS algorithm that combines both the adaptive window and the bottom-up methods for improving the segmentation accuracy.

Human activity classification accuracy is determined by a number of factors such as optimal classifier selection, the data sensing approach, and the data sensing frequency. The work of Gao et al. [18] investigated how the sampling frequency impacted the performance of classifiers, by increasing the sampling rate from 10 to 200 Hz (in 10 Hz increments). They demonstrated that the recognition accuracy was not sensitive to the sampling rate (only a 1% increase above 20 Hz which stabilized beyond 50 Hz). However, the high sampling rate could lead to greater computing load and power requirements. Saini et al. [19] presented a two-person interaction monitoring framework for analysing individual activities in people who may be suffering from some form of psychological disorder, based on the Kinect sensor. They used BLSTM-NN classifier to recognize each individual’s activities, and applied a lexicon approach to improve the performance. Their testing achieved a maximum accuracy of 70.72% for 24 different activities. Hegde et al. [20] compared insole-worn and wrist-worn sensors used for daily activities classification, and demonstrated that the recognition accuracy was 81% for an insole-worn sensor alone, 69% for the wrist-worn sensor alone, and 89% for the two sensors combined. Thus, insole-worn sensors present a compelling alternative or companion to wrist-worn devices. Macron et al. [21] proposed a human gesture recognition system based on volumetric data sequences, and used the HMMs classifier to identify a set of key postures, classifying their sequences over a set of possible actions. They also presented a simple method to identify the number of hidden states of the HMMs for improving the gesture classification performance. Finally, they achieved a 96% accuracy for ten different actions. Naveed et al. [22] introduced methodologies of heterogeneous features fusion for improving the performance of human activity recognition. They employed the time efficiency and optimality of SMO to train an SVM that was tested by a single person and multi-human dataset, achieving an accuracy of 91.99% and 86.48% respectively.

Our review of related work has demonstrated that each algorithm has its advantages and drawbacks. No one classifier works best for every problem as there are several practical factors to consider such as the size and structure of the dataset. Thus, for our machine learning work, it was necessary to employ many different algorithms in order to identify the most appropriate classifier. Hence, for this study, we designed a hybrid classifier (hAHC) for real-time posture recognition, which utilizes different algorithms for different tasks.

3 Methodologies

Our experimental data collection platform was constructed around ultra-wide band (UWB) anchors and tags. Anchors are fixed UWB hardware nodes, containing at least one so-called Master Anchor responsible for collecting all the data from the other anchors. Anchors send/receive messages to/from mobile tags.

3.1 The system infrastructure

The infrastructure consisted of a number of components as illustrated in Fig. 1.

Fig. 1
figure 1

The system configuration and the tag coordinate system

The sensing and communication hardware comprised a set of openRTLS UWB master anchors, with UWB tags [23], which were used for data collection in this study.

The tag was housed in a small belt-worn bag on the subjects’ waist, as shown on the right side of Fig. 1. Each tag contained several sensors and actuators such as light, IR proximity, pressure, temperature, and a 9-axes inertial measurement unit (IMU) that included accelerometers (ACCs), gyroscopes (gyros) and, a magnetometer.

The tag communicated with the master anchor wirelessly using a UWB radio. The real-time posture recognition system (ARS) was executed on a laptop which received the tag datasets from the master anchor via UDP messages. Based on our previous work [24], we had determined that the sub-feature set (Ax,Ay, Az, Axyz, ΔA) had better performance and fewer features compared to other sub-feature sets such as a combination of acceleration data along with associated gyros angles hence, for this study only, the ARS was based on the ACCs datasheet.

3.2 Tri-axial ACCs calibration

The IMU, in our tags, had not been factory-calibrated. The raw measurements ACCs were not specified in ± g (g = 9.8067 m/s2) level. As a consequence, it was necessary to perform ACCs calibration before the IMU could be used so as to estimate misalignment errors, scale factors, and offsets [25].

The ACCs calibration utilizes the fact that ACCs are affected by gravity when they are under static conditions. The calibrated acceleration Ai (i = x, y, z) and the ACC raw measurements Âi (i = x, y, z) can be expressed as in (1).

$$\left[\begin{array}{c}{A}_{x}\\ {A}_{y}\\ {A}_{z}\end{array}\right]={\left[{A}_{m}\right]}_{3\times 3}\left[\begin{array}{ccc}{A}_{scX}& 0& 0\\ 0& {A}_{scY}& 0\\ 0& 0& {A}_{scX}\end{array}\right]\left[\begin{array}{c}{\widehat{A}}_{x}-{b}_{x}\\ {\widehat{A}}_{y}-{b}_{y}\\ {\widehat{A}}_{z}-{b}_{z}\end{array}\right]=\left[\begin{array}{ccc}{X}_{11}& {X}_{12}& {X}_{13}\\ {X}_{21}& {X}_{22}& {X}_{23}\\ {X}_{31}& {X}_{32}& {X}_{33}\end{array}\right]\left[\begin{array}{c}{\widehat{A}}_{x}\\ {\widehat{A}}_{y}\\ {\widehat{A}}_{z}\end{array}\right]+\left[\begin{array}{c}{X}_{10}\\ {X}_{20}\\ {X}_{30}\end{array}\right]$$
(1)

where Am is the 3 by 3 misalignment matrix between the non-orthogonal device body axes and the orthogonal ACCs sensing axes; Asc is the scale factor, and b is the offset.

$$\left[\begin{array}{cc}{A}_{x}& \begin{array}{cc}{A}_{y}& {A}_{z}\end{array}\end{array}\right]=\left[\begin{array}{cc}\begin{array}{ccc}{\widehat{A}}_{x}& {\widehat{A}}_{y}& {\widehat{A}}_{z}\end{array}& 1\end{array}\right]\left[\begin{array}{c}\begin{array}{ccc}{X}_{11}& {X}_{21}& {X}_{31}\\ {X}_{12}& {X}_{22}& {X}_{32}\\ {X}_{13}& {X}_{23}& {X}_{33}\end{array}\\ \begin{array}{ccc}{X}_{10}& {X}_{20}& {X}_{30}\end{array}\end{array}\right]$$
(2)

The goal of ACCs calibration is to resolve 12 parameters from X10 to X33. Thus, with any given raw measurements Âi, the Ai can be calculated based on (1). In order to obtain the 12 parameters, Eq. (1) can be rewritten equally as (2).

Based on (2), calibration can be performed at 6 stationary positions that include two directions, up and down, for each of the three axes X, Y and Z, based on the tag coordinate system. In theory, an accelerometer will measure the vertical-axis value in ± 1 g, and along the other two axes, with values of 0 if the unit remains stationary relative to the earth’s surface. Hence, there are 6 ideal outputs Ag known as (± 1 g, 0, 0); (0, ± 1 g, 0); (0, 0, ± 1 g) for the 6 static conditions, shown in (3).

$${A}_{g}={\left[\begin{array}{ccc}\pm 1g& 0& 0\\ 0& \pm 1g& 0\\ 0& 0& \pm 1g\end{array}\right]}_{3\times 6} ,\mathrm{ g}=9.8067\mathrm{m}/{\mathrm{s}}^{2}$$
(3)

Meanwhile, the sensor’s raw data can be collected, for a given period of time (e.g. 10 s), at each of the 6 stationary positions and averaged for each position thereby reducing the influence of random noise. Thus, there are six measured averaging raw datasets obtained which are shown in (4). Finally, the calibration parameters vector X can be calculated using (5), according to (2), (3) and (4). Here we apply the least square method in (5). After the calibration parameters matrix X were determined, the ACCs raw data were calibrated using (2).

$$\widehat{A}={\left[\begin{array}{ccc}{\stackrel{-}{\widehat{\mathrm{A}}}}_{x1}& \begin{array}{cc}{\stackrel{-}{\widehat{\mathrm{A}}}}_{y1}& {\stackrel{-}{\widehat{\mathrm{A}}}}_{z1}\end{array}& 1\\ \vdots & \ddots & \vdots \\ {\stackrel{-}{\widehat{\mathrm{A}}}}_{x6}& \begin{array}{cc}{\stackrel{-}{\widehat{\mathrm{A}}}}_{y6}& {\stackrel{-}{\widehat{\mathrm{A}}}}_{z6}\end{array}& 1\end{array}\right]}_{4\times 6}$$
(4)
$$X={A}_{g}\bullet {\widehat{A}}^{-1}={A}_{g}\bullet {\widehat{A}}^{T}\bullet {\left[{\widehat{A}}^{T}\bullet \widehat{A}\right]}^{-1}$$
(5)

where \({\stackrel{-}{\widehat{\mathrm{A}}}}_{xi}\) is the average raw acceleration along the X-axis at each of the 6 positions, \({\widehat{A}}^{T}\) means matrix transpose, and \({\widehat{A}}^{-1}\) means matrix inverse.

Compared to existing IMU calibration approaches, our method is efficient and easy to implement. The parameters matrix X was calculated offline, then each measured raw ACCs \(\left[\begin{array}{ccc}{\widehat{A}}_{x}& {\widehat{A}}_{y}& {\widehat{A}}_{z}\end{array}\right]\) was calibrated using (2) online. An advantage of this approach is that it required only a software-based calibration phase avoiding the need for special mechanical platforms that feature in many other IMU calibration procedures [26, 27]. In addition, there are fewer calibration parameters (12 vs.18) in our method than in other methods, such as in [28]. These differences result in our approach having a lower calibration cost than most other methods.

3.3 Data collection and feature extraction

A calibrated raw dataset was organized as (t, Ax, Ay, Az) followed by extracting more features using (6) and (7).

$${A}_{xyz}\left(t\right)= \sqrt{{{A}_{x}(t)}^{2}+{{A}_{y}(t)}^{2}+{{A}_{z}(t)}^{2}}$$
(6)
$$\Delta A\left(t\right)=\left|{A}_{xyz}\left(t\right)-{A}_{xyz}\left(t-1\right)\right|$$
(7)

where Axyz is the three-dimensional acceleration, ΔA is the absolute Axyz change between time points t and (t–1).

Since the feature t is only used to record the time points, there are 5 features (Ax, Ay, Az, Axyz, ΔA) in the dataset used for further processing.

3.4 Adaptive signal segmentation (ASS)

The signal segmentation aims to divide a signal into several periods with similar statistical features. There are two main methods used to do this, static and adaptive segmentation.

The static method divides the signal into fixed periods and is simple and easy to implement. However, it is less reliable, since the duration of the activities are not always equal so, for our work, we chose to use a posture-based adaptive approach to improve the reliability and clarity of the real-time posture recognition. Details of the method are described by the following steps.

  1. 1)

    Define a sliding window size w = 2f (f is the sampling frequency) as shown in (8);

  2. 2)

    Calculate the difference of average \(\stackrel{-}{\Delta A}\) between the front half and the rear half sliding window, which is greater than an empirical threshold th1 as shown in (9);

  3. 3)

    Calculate the time difference \(\Delta t\) between the middle point of the sliding window and the previous boundary point, which is greater than 2 s as shown in (10).

  4. 4)

    Define a boundary point array bp[], If the sliding window satisfies both (9) and (10) at the same time, the middle point of the sliding window is determined as the boundary point and saved to the array bp[].

$$ bp[j] = i + f,\quad \left\{ \begin{array}{ll}{{\forall w[i,i + 2f] \in {\text{signal}}\Delta }A } &\quad (8) \\ {\overline{\Delta A} [i] = \frac{{\left| {\sum\nolimits_{i}^{i + f} {\Delta A} - \sum\nolimits_{i + f}^{i + 2f} {\Delta A} } \right|}}{f}} \ge th1 &\quad (9)\\ {{\Delta t[i]} = t[bp[j]] - t[bp[j - 1]] > 2s} &\quad (10) \end{array} \right.$$

where th1 is a threshold defined empirically based on the calculated and mixed dataset (∆A) as shown in Fig. 2. Here th1 = 0.02; i = 1,…n, n is the length of ΔA signal; j = 1,…,m, m is the length of bp[] array.

Fig. 2
figure 2

The experimental threshold th1 based on calculated and mixed \(\stackrel{-}{\Delta A}\) dataset (from all subjects). Note: the label on the figure: 4-run; 2-walk; 1-motionless (sit, stand and lying)

Based on (9), the signal segmentation will be correct only for motionless postures, but there are many redundant boundary points for motion postures as shown in Fig. 3. Based on (10), the redundant points are eliminated from the motion actions as shown in Fig. 4, since in the short time concerned (less than 2 s) rhythmic posture transition is ignored during a period of the same motion posture (e.g. walking or running period).

Fig. 3
figure 3

Based on Eq. (9), the segmentation is correct only for motionless postures. There remain many redundant boundary points for motion postures

Fig. 4
figure 4

Based on Eqs. (9) and (10), the segmentation is correct for both motionless and motion postures

Note that for an easy understanding of the procedure, the signal segmentation results shown in Figs. 3 and 4 use the whole signal. However, in the real-time application, the signal was classified every 5 s. This means the signal segmentation was performed for every 100 samples in this study (f = 20 Hz) rather than the whole dataset.

In comparison to existing signal segmentation methods, such as [13, 17], the signals we use (ΔA and Axyz) are not sensitive to the sensor’s position and orientation, thus making the system configuration more flexible. For example, the tag can be located on the waist, back or pocket, and its direction can be horizontal or vertical, the only restriction being that it remains in the same position for the duration of the session.

3.5 Single classifier vs. hybrid classifier

Three classifiers were designed and compared allowing us to determine the best classifier for our system as well as providing a benchmark to wider work. Their details are presented below.

1. Single multi-layer perceptron (MLP) classifier selection

MLP Neural Networks use a gradient descent with backpropagation techniques to learn one or more non-linear hidden layers between the input layer \(X=\left\{{{X}_{i}|x}_{1},{ x}_{2},\dots ,{x}_{d}\right\}\) and output layer \(Y=\left\{{Y}_{i}|{y}_{1},{ y}_{2},\dots ,{y}_{n}\right\}\). In this study, the input X = {Xi| Ax, Ay, Az, Axyz, ∆A}. The output defines 5 classes Y = {Yi| sitting, standing, walking, running, lying}.

A model selection experiment was designed to determine how many hidden layers and how many neurons for each of the hidden layers should be created, where the training and testing dataset were collected separately from the same subject. In doing this a total of 30 models were trained using the same training set. The models varied from 1 to 3 hidden layers, and the number of neurons varied from 6 to 15 at each of the layers. Following this, the 30 models were evaluated using the same testing set. Finally, two hidden layers single classifier sMLP (12,12) were selected based on the comparison of experimental results (higher accuracy 73% and fewer neurons) as shown in Table 1, where MLP(n), MLP (n, n) and MLP (n, n, n) represent one, two and three hidden layers respectively. n is the number of neurons in each of the layers. Here we only utilise a common hidden layer configuration setting which sets the same size for the hidden layers.

Table 1 Comparison of classification accuracy for a new dataset, via 30 models with different neuron numbers in each of 3 hidden layers

During the model training stage, sMLP (n, n) learns an activation function \(y=f\left(x,w\right)\) using (11).

$${y}_{j}=\sum_{k=1}^{n}{w}_{k,j}^{2}\bullet f2 \left(\sum_{k=1}^{n}{w}_{k,j}^{1}\bullet f1\left(\sum_{i=1}^{d}{w}_{i,j}^{0}\bullet {x}_{i}+b0\right)+b1\right)+b2$$
(11)

where n is number of neurons in layer l; d is the dimension of inputs; \({w}_{i,j}^{l}\) is the weights, which represents the relationship between two nodes (\(i\), \(j\)); b is the bias.

The softmax function [29] is selected as the activation function (f1 and f2), which calculates the probabilities that sample \({x}_{i}\) belongs to each of the classes \({y}_{j}\). The output is the class with highest probability.

2. Two hybrid classifiers design

The adaptive hybrid classifier (hAHC) and a static hybrid classifier (hSHC) design are explained in this section.

The hAHC combines the adaptive signal segmentation (ASS) (posture-based dynamic window) with the single classifier sMLP and plurality voting (PV) approach. The hSHC combines the static sliding window (SSW) (fixed-size) with the sMLP and PV approach. Here, the SSW uses a 1-s window (with a size of 20 samples, in this study). The three classifiers can be summarized as follows:

sMLP = a single MLP (12, 12).

hAHC = ASS + sMLP + PV.

hSHC = SSW + sMLP + PV.

The difference between hAHC and hSHC is their signal segmentation method. Their hybrid implementation procedure is similar and is described below:

  • First, the signal was divided into a multiple sub-segments using the ASS or SSW algorithm.

  • Then, each of the sub-segments was classified using sMLP, with the classification result being saved as an array \(P=\left[{p}_{1},\dots ,{p}_{n}\right]\), where n is the sample number within a sub-segment, pi is the class label for each sample.

  • Finally, for each classification result array, P was revised using the PV algorithm, which involved counting the number of each class for the array \(P=\left[{p}_{1},\dots ,{p}_{n}\right]\) and obtaining the relevant majority class label (maxL[i]). Then each sample within the array P was set to the same class label, using their maxL[i].

Obviously, the hAHC and hSHC classifiers can reduce the number of possible misclassified samples for each segmented signal. For example, 20 samples of walking action were observed for a sub-segment, out of which 12 samples were classified as walking but 8 samples were classified as standing by the sMLP. However, all samples in this segmented signal were labelled as walking by the PV algorithm. Thus, the hAHC and hSHC classifiers have potential to improve the robustness of the classification process by handling misclassified data.

4 Experiments

Nineteen subjects attended the experiments. The subjects performed all or part of the predefined 5 test behaviours [sitting, walking, standing, running, lying] and data was collected from a waist-worn tag. The experimental results were shown as an animated figure in real-time and validated against synchronized videos that were recorded using a camcorder.

4.1 Experimental protocols

The experimental protocols were designed and performed as follows:

  1. 1)

    The training dataset collection: Six subjects (Sub1-Sub6) performed the 5 actions, described above, in a stated order within a laboratory environment. The collected datasets were saved as trainSet.

  2. 2)

    The testing dataset collection: All the 19 subjects were divided into 3 groups. Each group was assigned 1 steward with 5 or 6 visitors, and performed 5 actions randomly based on simulated a security scenario on an outdoor grass field in a public space. Datasets collected from each group were classified in real-time as shown in Fig. 5. Concurrently, the 19 datasets were saved as testSet separately.

    Fig. 5
    figure 5

    Screen-shot showing the synchronized video with classified real-time Ax (collected from the steward) animated figures. The steward's actions were standing, walking, running and lying down. The top and bottom figures show the dynamic real-time posture recognition results

  3. 3)

    A simulated security scenario: The 'visitors' playing together on the grass field, a 'steward' was standing or walking. Suddenly, one of the visitors complains he/she had a headache and laid down on the ground, resulting in the remaining visitors summoning the steward with a loud cry of “Help!” In response, the steward ran quickly towards the visitors (the top left of Fig. 5), kneeling (face down lying) to help the unwell visitor (the bottom left of Fig. 5).

  4. 4)

    Models training, testing and comparison: Three models sMLP, hAHC, and hSHC were trained using the trainset, tested using all the 19 testSet, and compared using integrated datasets from 13 unfamiliar subjects.

4.2 Experimental results and discussion

The experiments were focused mainly on the adaptive signal segmentation (ASS) algorithm evaluation and comparison of the performance among the three classifiers. In addition, we conducted one other experiment to evaluate the signal calibration algorithm.

1. Acceleration calibration algorithm evaluation

First, one subject performed the 5 specified actions in the order walk-stand-run-sit-walk-lying, while collecting a raw dataset along with the calibrated dataset. Given this sequence of activities, we evaluated the classification performance (using the hAHC) by considering the raw and calibrated signal.

The classification results are presented in Fig. 6, which shows that the acceleration calibration algorithm contributed to reducing the measurement noise, thereby improving the classification accuracy. For example, considering the raw dataset, all samples that were erroneously classified as running, could be linked to the large amount of noise present in the sensed data. Additionally, the sample numbers required were reduced by calculating the average value for each 'data sensing' packet during the calibration processing phase. The 'data sensing' packets from the IMU were arranged in the TLV (type-length-value) format. The size of each TLV packet differs, from 6 to 13 values, thus the calibration also contributed to reducing the overall data load.

Fig. 6
figure 6

Classification results based on the calibrated dataset (top of the figure) and the raw dataset (bottom of the figure)

2. Adaptive signal segmentation algorithm evaluation

The experimental results (see in Fig. 7) indicated that the ASS algorithm was able to obtain the boundary points (bps) correctly for most of the different subjects.

Fig. 7
figure 7

The ASS algorithm works well for most of the different subjects. Here the Ax signal collected from Sub1 and Sub3 is presented

Figure 7 illustrates that the ASS algorithm separated signals for Sub1 and Sub3 correctly. The human test subjects performed the same actions (walking, running, sitting, lying, and standing) for different periods of time, and with their natural (and different) characteristics. Additionally, Sub3 undertook a short period of standing before walking, shown within the red circle at the beginning of Fig. 7.

Figure 8 shows that the ASS algorithm missed some bps for signals collected from Sub8 and Sub9, as well as shifting a bp position for the Sub9. While the problem can be reduced by increasing the number of human subjects used to learn the parameter th1, it can never be 100% correct for all subjects, so it needs to operate within an agreed 'acceptable error' for the parameters selection.

Fig. 8
figure 8

The ASS algorithm resulted in some incorrect or missed bps for Sub8 and Sub9

Additionally, the proposed ASS algorithm was compared to the bottom-up and sliding-window methods included within the Ruptures package, which is a Python library for offline change point detection [30]. The comparison results, based on signals collected from Sub3 and Sub9, are presented in Fig. 9.

Fig. 9
figure 9

Signal segmentation results from the bottom-up and sliding-window methods based on signals collected from Sub3 & Sub9

Figure 9 demonstrated that the bottom-up approach had similar performance to the ASS algorithm (the lower graphic in Fig. 7 for Sub3 and Fig. 8 for Sub9). They were both better than the sliding-window method. For example, ASS performed better for Sub3, and the bottom-up approach did better for Sub9, but sliding-window obtained missed bp for Sub3 and shifted bp for Sub3 and Sub9.

The main advantages of the ASS algorithm were found to be:

  • The combination of the adaptive window method (divided as motionless/motion windows) and the bottom-up approach (where the redundant points were deleted from the motion windows) improved the segmentation accuracy. This was particularly evident for brief periods of posture, such as the short motionless signal from Sub3 at the beginning of Fig. 7.

  • It should also be noted that ASS was ran online, for real-time applications, rather than offline.

3. Comparison of the three classifiers’ performance

The three classifiers were evaluated in real-time using an animated figure as shown in Fig. 10. This was able to show immediately how many action sections were classified correctly or incorrectly, but could not provide a numerical count, since there were no class labels included in the real-time dataset. Therefore, their performance was compared in two steps: (1) viewing the animated figures in real-time; (2) using classification accuracy gathered from the 'all subjects’ offline dataset with manually marked class labels.

Fig. 10
figure 10

Real-time classification results for subject-1 using hAHC and hSHC classifiers respectively

Figure 10 illustrated the real-time classification results based on data collected from subject-1. The left side figure was classified using the hAHC model, and the right side figure was classified using the hSHC model. From these results, it is evident that both classifiers did well. The hAHC predicted correct actions for each of the adaptive windows (although some window boundaries shifted slightly). The hSHC predicted incorrect actions for only a few sliding windows (e.g. the walking action appeared in the running action period of time). Nevertheless, the hSHC showed the results in an overlapping manner that was less clear compared to the results shown via the hAHC. In addition, Fig. 10 did not include the result from the sMLP classifier, since it shows, for each of the samples, more overlap than the hSHC classifier.

The hAHC performance was susceptible to significant influence from the ASS result so, in some cases, the ASS algorithm resulted in a shifted bp, or missed some bp that will decrease the classification accuracy dramatically as shown in Fig. 11, where part of the walking action was classified as standing, since a boundary point was missed by the ASS algorithm, as is shown in the lower part of Fig. 8.

Fig. 11
figure 11

Real-time classification result using the hAHC classifier for subject-9. Note that part of walking action was recognized as standing since a boundary point was missed by the ASS algorithm.

Three classifiers were compared using their classification report based on integrated datasets from 13 unfamiliar human subjects. The experimental results are shown in Table 2 using the 'precision', 'recall', and ' F1-score' for each class.

Table 2 Comparison of classification reports from the three classifiers evaluated using the same testing set collected from unseen subjects

The 'precision' (also called the positive predictive value), provides a measure of the ability of the classifier not to label as positive, a sample that is negative. The 'recall' (also known as 'sensitivity'), illustrates the ability of the classifier to find all the positive samples. The 'F1 score' is a weighted average of the 'precision' and 'recall', with the best score being 1 and worst 0.

Table 2 indicates that the hAHC model improved the average 'recall' slightly more than the hSHC, and significantly more than the sMLP (hAHC 82%; hSHC 79%; sMLP 71%) when used with the unseen datasets. The reason is that the three classifiers all classified the signal samples using the single model sMLP, but both hybrid models hAHC and the hSHC combined the sliding window segmentation with sMLP and PV algorithms to revise the results from the sMLP for each of the sliding windows, using the PV approach as shown in Fig. 12.

Fig. 12
figure 12

Comparison of the difference between the model hAHC and hSHC. The top figure shows the hAHC which used a posture-based adaptive widow segmentation with the PV approach; the bottom figure shows the hSHC which used a static sliding window segmentation with the PV method

The difference between both is that the hAHC used the posture-based adaptive window (top of Fig. 12), in contrast to the hSHC which used the 1-s static window (bottom of Fig. 12), with both obtaining a similar classification accuracy, however the hSHC had an overlap issue for results displaying in a real-time animated figure. Thus, is why we selected only the adaptive hybrid classifier hAHC for our real-time system.

All three classifiers obtained a higher average 'recall' for datasets collected from the training subjects but their average recall was lower for unseen subjects, since different subjects have different behaviour for the same action as shown in Fig. 13. As is evident from these results, it is challenging to train a suitable common model for all subjects.

Fig. 13
figure 13

Different people have different walking gaits, which poses a challenge for developing training models

In terms of how our work has advanced the state of the art, while the deep learning methods proposed in [31, 32], have unified the feature learning and classification into one model as a way of enhancing the performance of posture recognition, the convolutional neural network (CNN) used requires significant data and time for training the model. For example, Yang et al. [31] used 136,869 and 32,466 samples for training and testing respectively, and spent around 1 h for the CNN training, while spending 8 min for testing. In contrast, the hAHC model presented in this study provides a more computationally efficient real-time human posture classification method, which displays better performance with unfamiliar subjects.

5 Conclusion and future work

This study was motivated by the need to devise better real-time human posture classification methods for monitoring health incidents at home or monitoring security situations in a crowd environment.

Data-driven models are becoming popular for activity recognition, since their built in error terms quantify the generalization error [33]. However, previous research has used only a single model which has limitations for some situations. Thus, this work has investigated the potential for combining multiple models with the aim of achieving better performance than using any one of them alone.

In order to improve the posture recognition accuracy and the reliability for use with unseen datasets, three types of classifiers (one single model sMLP and two hybrid models: hAHC & hSHC) were designed and evaluated using a dataset collected from unseen subjects. Experimental results demonstrated that the hAHC model performed best when compared to others, since it not only obtained the desired accuracy, but also displayed the real-time results clearly using the posture-based adaptive window that avoids the overlapping issue. In this way the work has advanced the state-of-the-art in classification and posture recognition. Thus, from this work, and supported by the literature review findings, the novelty of this study can be summarized as follows:

  • The development of a novel software-only-based sensor calibration algorithm that can improve accelerometer precision efficiently compared to earlier hardware-based methods.

  • The creation of an innovative hybrid model hAHC which combines three complementary classifiers in three layers (ASS + sMLP + PV), that are able to reduce the number of misclassified samples from the sMLP classifier by using the PV approach within each posture-based adaptive sliding window, that have each been separated via the ASS algorithm, as shown in Fig. 12.

  • The adoption of a creative visualization method whereby posture classification that was performed in real-time, is visualized clearly in an animated format.

Additionally, like many other systems, the output of the analysis can provide real-time reminders when an abnormal activity occurs (eg. falls).

Finally, by way of future plans, we aim to investigate the development of a sub-function that will automatically train an individual model for each new user as this would provide a very efficient way for improving the classification system’s robustness and reliability.